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ABSTRACT 


Hus  thesis  shows  that  in  Busy  practical  situations  the  proceoing  of  a  discrete-time  signal 
can  be  accomplished  using  only  the  magnitude  of  its  short-time  spectrum.  Mild  restrictions  on 
die  signal  and  on  die  analysis  window  of  the  short-time  spectrum  are  shown  to  be  sufficient  far 
unique  signal  representation  with  die  short-time  spectral  magnitude.  Furthermore,  various  algo¬ 
rithms  are  developed  which  reconstruct  the  signal  from  appropriate  samples  of  the  short-time 
spectral  magnitude.  Some  of  these  algorithms  are  designed  to  obtain  signal  estimates  from  the 
processed  short-time  spectral  magnitude,  which  generally  does  not  have  a  valid  short-time  struc¬ 
ture.  These  algorithms  are  successfully  applied  to  the  time-scale  modification  and  noise  reduction 
problems  in  speeds  processing.  However,  the  results  presented  here  have  similar  potential  for 
other  application  areas,  induding  those  with  multidimensional  signals. 
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CHAPTER  ONE:  SIGNAL  PROCESSING  USING  ONLY 
SHORT-TIME  SPECTRAL  MAGNITUDE 

The  time-invariance  of  spectral  processing  [1]  is  a  disadvantage  in  several  applications,  par¬ 
ticularly  those  involving  speech  and  images.  For  example,  a  speech  waveform  consists  of  voiced 
and  unvoiced  sections  [2].  The  voiced  sections  have  a  periodic  structure,  whereas  the  unvoiced 
sections  consist  mainly  of  wideband  random  noise.  The  processing  requirements  of  these  two 
types  of  sections  are  often  quite  different,  hi  voiced  sections,  for  example,  it  is  important  to 
preserve  the  periodicity  but  no  such  restriction  applies  to  unvoiced  sections.  On  the  other  hand, 
in  unvoiced  sections  it  is  often  essential  to  preserve  the  wideband  random  noise  characteristic. 
Even  within  the  various  voiced  or  unvoiced  sections,  the  signal  properties  tend  to  change.  For 
example,  within  voiced  sections,  the  length  as  well  as  the  shape  of  each  period  is  generally 
changing  as  a  function  of  time.  In  fact,  speech  characteristics  such  as  periodicity  are  generally 
assumed  to  be  constant  over  only  short  durations  on  the  order  of  20  milliseconds  [2].  In  many 
cases,  therefore,  it  is  inadvisable  to  apply  time-invariant  processing  to  speech  over  intervals  much 
greater  than  20  milliseconds. 

To  achieve  a  degree  of  time  dependence  in  the  processing  of  signals  such  as  speech,  spectral 
processing  is  often  applied  independently  to  various  short-time  sections  of  a  signal.  This  type  of 
processing  is  usually  based  on  the  short-time  spectrum  [3].  La  section  1.1  of  this  chapter,  we 
present  the  definition  of  the  short-time  spectrum  for  discrete-time  signals.  The  magnitude  and 
phase  of  the  short-time  spectrum  of  a  signal  are  usually  both  required  in  various  signal  process¬ 
ing  applications.  However,  as  we  shall  see  in  section  1.2,  there  are  some  applications  where  it  is 
desirable  to  accomplish  the  processing  with  only  the  magnitude  of  the  short-time  spectrum.  This 
has  previously  not  been  possible  because  of  the  lack  of  any  practically  useful  results  on  the  rela¬ 
tionship  between  the  short-time  spectral  magnitude  and  the  corresponding  signal.  In  particular,  it 
is  important  to  develop  results  on  signal  reconstruction  from  the  magnitude  of  the  short-time 
spectrum.  Furthermore,  since  a  processed  short-time  spectral  magnitude  may  not  necessarily 
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correspond  to  any  signal,  we  would  like  to  be  able  to  obtain  reasonable  signal  estimates  in  such 
cases.  This  thesis  presents  a  number  of  important  results  on  these  problems  that  make  possible 
the  practical  implementation  of  signal  processing  using  only  the  magnitude  of  the  short-time 
spectrum.  Some  previous  investigations  on  this  subject  are  described  in  section  1.3.  Finally,  in 
section  1.4,  we  outline  the  major  results  of  this  thesis. 

1.1  Short-Time  Spectrum 

The  short-time  spectrum  has  been  developed  for  continuous  as  wall  as  discrete-time  signals. 
Excellent  references  on  the  subject  include  the  work  of  C.  Weinstein  [3],  J.  Allen  [4],  and  M. 
Portnoff  [5].  In  this  thesis,  we  are  interested  in  discrete-time  signal  processing  with  the  short- 
time  spectrum.  For  a  discrete-time  signal  x(n),  the  short-time  spectrum  is  a  function  of  time  as 
well  as  frequency  and  it  is  mathematically  expressed  as 

Xw(nL,m)  -  2  x(m)w(nL-m)e~J,aM  (1.1) 

where  the  subscript  w  in  Xw  (nL  ,u>)  denotes  the  analysis  window,  w  (/<).  The  parameter  L  is  an 
integer  which  denotes  the  separation  in  time  between  adjacent  short-time  sections.  This  parame¬ 
ter  is  independent  of  time  and  is  selected  so  as  to  ensure  a  degree  of  time  overlap  between  adja¬ 
cent  short-time  sections.  For  a  fixed  value  of  n,  the  short-time  spectrum  Xw  (nL  ,o»)  defined  in 
(1.1)  represents  the  Fourier  transform  with  respect  to  m  of  the  short-time  section 
fm (m )*x (m )w(nL  —at).  The  sliding  window  interpretation  [5]  views  Xw(nL,u> )  as  being  gen¬ 
erated  by  shifting  the  time-reversed  analysis  window  across  the  signal.  After  each  shift  of  L  sam¬ 
ples,  the  window  is  multiplied  with  the  signal  and  the  Fourier  transform  is  applied  to  the  pro¬ 
duct  There  are  other  interpretations  of  the  short-time  spectrum,  including  a  well  known  filter 
bank  interpretation  [2].  However,  for  the  purposes  of  this  thesis,  ws  find  the  sliding  window 
interpretation  to  be  the  most  appropriate. 

For  most  signals  and  analysis  windows,  the  short-time  sections  fj(m )  generally  do  not  have 
any  symmetry  with  respect  to  the  origin.  Consequently,  the  short-time  spectrum  is  generally  a 
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complex  function.  In  many  short-time  spectral  processing  applications  both  the  magnitude  and 
the  phase  of  the  short-time  spectrum  sue  used.  As  illustrated  in  the  next  section,  however,  it  is 
important  to  determine  if  short-time  spectral  processing  can  be  accomplished  using  only  the  mag¬ 
nitude  of  the  short-time  spectrum. 

1.2  Applications  For  Magnitude  Only  Processing 

In  this  section,  we  consider  two  important  applications  that  illustrate  the  importance  of 
developing  practical  signal  processing  techniques  which  use  only  the  magnitude  of  the  short-time 
spectrum.  Specifically,  we  consider  the  problems  of  noise  reduction  and  time-scale  modification. 
These  problems  are  stated  mostly  in  the  context  of  speech  processing.  However,  it  will  be  clear 
from  the  discussion  that  the  same  concepts  also  apply  to  other  applications. 

We  first  consider  the  problem  of  noise  reduction.  Suppose  that  a  discrete-time  signal  x  (n ) 
is  the  sum  of  a  desired  signal  j(n)  and  a  noise  signal  e  (a).  The  signal  x{n)  may,  for  example, 
represent  samples  of  a  noisy  speech  recording.  If  e  (a)  originates  from  a  random  process  that  can 
be  appropriately  modelled  as  being  stationary,  there  are  some  classical  spectral  processing 
methods  for  noise  reduction.  These  include  Wiener  filtering,  power  spectrum  filtering,  and  spec¬ 
tral  subtraction.  A  comprehensive  survey  of  such  processing  is  contained  in  a  paper  by  r.im  and 
Oppenheim  on  noise  reduction  for  speech  signals  [6].  These  noise  reduction  procedures  have  the 
property  that  they  process  only  the  magnitude  of  the  signal  spectrum;  the  spectral  phase  of  the 
noisy  signal  x  (a )  is  retained  in  the  processed  signal.  For  applications  such  as  speech  processing, 
these  techniques  perform  relatively  better  when  applied  to  the  short-time  spectrum  rather  than 
the  (long-time)  spectrum  of  x(n).  This  way,  each  short-time  section  can  be  filtered  according  to 
its  own  spectral  characteristics.  Of  course,  only  the  spectral  magnitudes  of  the  short-time  sections 
are  affected  by  the  processing.  A  problem  of  interest  is  whether  an  estimate  of  the  short-time 
spectral  phase  can  be  obtained  from  the  processed  magnitude  of  the  short-time  spectrum.  This  is 
equivalent  to  estimating  the  processed  signal  from  the  short-time  spectral  magnitude  alone. 
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Thus,  in  addition  to  obtaining  a  processed  short-time  spectral  phase  estimate,  such  a  technique 
would  have  the  property  of  not  requiring  any  spectral  phase  information  on  the  noisy  signal. 

Time-scale  modification  of  signals  is  another  area  where  short-time  spectral  processing  plays 
an  important  role.  The  basic  problem  is  to  compress  or  expand  the  signal  without  changing  its 
short-time  spectral  characteristics.  In  other  words,  the  rate  of  change  of  the  spectral  characteris¬ 
tics  is  to  be  modified  without  significantly  changing  the  principal  frequency  locations  of  spectral 
energy  within  the  various  short-time  sections.  In  speech,  such  processing  corresponds  to  a 
change  in  the  apparent  rate  of  articulation  without  any  appreciable  degradation  of  perceptual 
quality.  M.  Portnoff  [7]  has  developed  a  time- scale  modification  technique  based  on  the  short- 
time  spectrum.  The  technique  applies  a  linear  time-scaling  to  the  short-time  spectrum  and  then 
divides  an  estimate  of  the  unwrapped  phase  [1]  of  the  short-time  spectrum  by  a  factor  propor¬ 
tional  to  the  desired  rate  of  time  compression  or  expansion.  Finally,  the  processed  short-time 
spectrum  is  used  to  synthesize  an  estimate  of  the  time-scale  modified  signal.  Throughout  this 
technique,  both  the  magnitude  and  phase  of  the  short-time  spectrum  are  used.  However,  if  signal 
estimation  could  be  done  directly  from  the  processed  short-time  spectral  magnitude,  the  phase 
processing  would  be  avoided.  Thus,  in  this  case,  a  major  incentive  for  developing  techniques  for 
signal  estimation  from  short-time  spectral  magnitude  is  to  avoid  the  computational  expense  asso¬ 
ciated  with  phase  processing. 


1.3  Previous  Investigations 

The  magnitude  of  the  short-time  spectrum  was  the  subject  of  investigations  even  before  the 
short-time  spectrum  itself.  In  particular,  researchers  were  motivated  to  study  the  short-time  spec¬ 
tral  magnitude  because  it  was  physically  easier  to  estimate  for  signals  such  as  speech.  The  first 
formal  definition  of  the  short-time  spectral  magnitude  was  introduced  by  R.  Fano  [8]  in  his  stu¬ 
dies  on  speech  analysis.  Investigations  by  R.  Fano,  M.  Schroeder  and  B.  Atal  [9],  and  A. 
Kharkevich  [10]  were  responsible  for  developing  many  aspects  of  short-time  spectral  analysis. 
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C.  Weinstein  [3]  formally  showed  that  continuous  as  well  as  discrete-time  signals  can  be 
uniquely  determined  from  die  short-time  spectrum  to  within  a  scale  factor.  Other  investigators 
such  as  J.  Allen  [4]  and  M.  Portnoff  [S]  have  further  refined  the  results  obtained  by  C.  Wein¬ 
stein.  For  example,  several  different  procedures  have  been  established  for  signal  reconstruction 
from  the  short-time  spectrum.  Portnoff  has  recently  introduced  an  elegant  approach  to  signal 
reconstruction  from  short-time  spectrum.  This  approach  includes  previously  established  recon¬ 
struction  procedures  as  special  cases.  As  a  consequence  of  all  these  studies,  the  short-time  spec¬ 
trum  has  become  a  very  useful  signal  representation  for  various  signal  processing  purposes. 

The  question  of  unique  signal  representation  with  the  magnitude  of  the  short-time  spectrum 
has  remained  mostly  unresolved.  A  study  by  Weinstein  [3]  showed  the  uniqueness  of  the  short- 
time  spectral  magnitude  only  for  a  very  restricted  dass  of  signals  and  analysis  windows.  In  partic¬ 
ular,  his  approach  was  based  on  the  property  that  minimum  phase  signals  are  uniquely  specified 
by  their  spectral  magnitude.  He  observed  that  if  each  short-time  section  were  minimum  phase, 
we  could  uniquely  reconstruct  the  short-time  sections  from  their  spectral  magnitudes.  If  there  is 
sufficient  overlap  between  short-time  sections,  all  the  samples  of  the  original  signal  may  then  be 
obtained  by  dividing  out  the  analysis  window  from  the  various  short-time  sections. 

More  recently,  an  alternative  approach  to  unique  signal  representation  with  short-time 
spectral  magnitude  was  used  by  R.  Altes  [11].  This  approach  places  no  restriction  on  the  signal 
to  be  represented.  However,  the  analysis  window  is  required  to  satisfy  a  condition  which  in 
practice  means  that  the  analysis  window  has  to  be  longer  than  the  signal.  These  results  were 
obtained  using  a  relationship  between  the  short-time  spectral  magnitude  and  an  ambiguity  func¬ 
tion,  which  for  a  discrete-time  signal  x(n)  is  defined  as 

Ax (n ,«)  =  2  x(m)x(n-m)ej'am  (1.2) 

The  results  derived  by  R.  Altes  show  that  if  the  analysis  window  w  («  )  is  such  that  Aw  ( n  ,(u)*0 
for  any  pair  n  ,<u,  then  the  signal  x(/i)  can  be  uniquely  determined  up  to  a  sign  factor  from  the 
magnitude  of  Xw  (n  ,<o).  Furthermore,  if  x(«)  is  restricted  to  be  a  finite-length  signal,  then  the 
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requirement  on  w (a)  is  that  Aw(n ,tu)  must  not  be  zero  for  values  of  n  for  which  Ax(n,t»)  is 
nonzero.  From  (1.2),  it  can  be  easily  observed  that  the  time  duration  of  the  ambiguity  function 
is  proportional  to  the  time  duration  of  the  signal  it  represents.  Hence,  this  approach  gives  condi¬ 
tions  that  are  sufficient  for  unique  signal  representation  with  short-time  spectral  magnitude  only 
for  cases  where  the  analysis -window  is  longer  than  the  signal  being  represented. 

In  short-time  spectral  processing  we  are  generally  interested  in  analysis  windows  whose 
lengths  are  much  shorter  then  the  signal  to  be  processed.  In  this  thesis,  we  present  results  which 
show  that  the  uniqueness  of  the  short-time  spectral  magnitude  for  signal  representation  can  also 
be  extended  to  such  cases. 

1.4  Outline  of  Thesis 

In  this  thesis,  we  show  that  in  many  practical  situations  a  discrete-time  signal  is  uniquely 
represented  by  its  short-time  spectral  magnitude.  The  key  assumption  in  these  results  is  that  the 
analysis  window  is  a  known  finite-length  sequence.  In  such  cases,  it  is  seen  that  if  there  is  suffi¬ 
cient  overlap  between  short-time  sections,  the  problem  of  determining  a  signal  from  its  short- 
time  spectral  magnitude  requires  certain  results  on  the  extrapolation  of  finite-length  signals  from 
(long-time)  spectral  magnitude.  Such  results  are  derived  in  chapter  2  of  this  thesis  and  then  used 
in  chapter  3  for  developing  conditions  under  which  the  short-time  spectral  magnitude  is  a  unique 
signal  representation. 

For  practical  applications,  it  is  necessary  to  obtain  algorithms  for  signal  reconstruction  from 
samples  of  the  short-time  spectral  magnitude.  In  chapter  4,  we  present  a  number  of  such  algo¬ 
rithms  with  various  implementation  properties.  These  algorithms  have  been  successfully  imple¬ 
mented  for  the  reconstruction  ofi  speech  signals.  They  also  offer  similar  potential  for  other  appli¬ 
cation  areas,  including  those  with  multidimensional  signals. 

hi  general,  processing  the  short-time  spectral  magnitude  results  in  a  function  which  does  not 
correspond  to  the  short-time  spectral  magnitude  of  any  signal.  An  important  contribution  of  this 
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thesis  is  the  development  of  signal  reconstruction  algorithms  that  yield  reasonable  signal  estimates 
from  the  processed  short-time  spectral  magnitude.  Some  general  issues  involved  in  applying  the 
signal  reconstruction  algorithms  to  the  processed  short-time  spectral  magnitude  are  discussed  in 
chapter  5. 

The  final  chapters  of  this  thesis  consider  the  application  of  the  various  ideas  in  chapters  2  to 
5  to  the  problems  of  time-scale  modification  and  noise  reduction,  particularly  in  the  context  of 
speech  processing.  For  time-scale  modification,  we  have  implemented  a  procedure  whose  perfor¬ 
mance  is  comparable  to  previous  systems  based  on  both  the  magnitude  and  the  phase  of  the 
short-time  spectrum.  In  contrast,  however,  our  technique  has  significantly  less  computational 
complexity. Furthermore,  we  have  also  implemented  a  short-time  spectral  processing  technique 
for  noise  reduction  that  estimates  the  processed  short-time  spectral  phase  from  the  processed 
short-time  spectral  magnitude.  The  performance  thus  obtained  appears  comparable  to  that 
obtained  with  techniques  that  require  both  the  magnitude  and  phase  of  the  short-time  spectrum 
of  the  noisy  signal. 
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CHAPTER  TWO:  SIGNAL  EXTRAPOLATION  FROM 
SPECTRAL  MAGNITUDE 

In  this  chapter,  we  derive  theorems  on  the  extrapolation  of  discrete-time  signals  from  their 
(long-time)  spectral  magnitude.  Besides  being  important  theoretical  results  in  their  own  right, 
these  theorems  play  a  central  role  in  deriving  conditions  under  which  the  short-time  spectral 
magnitude  is  a  unique  signal  representation. 

hi  discrete-time  signal  extrapolation,  a  signal  x(n )  known  up  to  n-n'  is  extended  for 
n  >n‘ ,  maintaining  consistency  with  all  a  -priori  knowledge  on  x  (a  ).  The  signals  considered  in 
this  chapter  are  known  to  be  zero  outside  an  interval  for  some  positive  integer  N .  The 

particular  location  of  this  interval  on  the  n-axis  is  for  notations!  convenience  only  ;  none  of  the 
results  derived  in  this  chapter  are  affected  by  any  shift  in  this  location.  Given  x(i»)  for 
0  2S  n  ■&  U  where  At  <  N ,  we  wish  to  extrapolate  x(n)  up  to  a  ,  using  the  spectral  magni¬ 
tude,  \X  (ui)  j ,  where 

*(«)=  1  (2.1) 

Furthermore,  we  are  interested  in  determining  conditions  under  which  the  extrapolation  is 
unique.  Section  2.2  derives  two  theorems  on  such  extrapolation  for  the  case  in  which  only  the 
sample  x  (N  )  is  unknown.  This  is  referred  to  as  single  sample  extrapolation.  Section  2.3  presents 
a  theorem  for  the  more  general  case,  where  several  samples  of  x(/i)  are  extrapolated.  These 
theorems  are  used  extensively  in  Chapter  3  for  deriving  conditions  under  which  the  short-time 
spectral  magnitude  is  a  unique  signal  representation.  The  relationship  between  the  theorems  in 
tins  chapter  and  the  uniqueness  of  the  short-time  spectral  magnitude  is  discussed  in  the  following 
section. 
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2.1  Relation  to  Short-Time  Spectral  Magnitude 

For  a  signal  x  (n )  and  a  positive  integer  L ,  the  short-time  spectral  magnitude  is  given  by 

Sw  (nL  ,to)=  j  2  x(m)w(nL-m)e~J,um  \2  (2.2) 

m  *-* 

where  die  subscript  w  in  Sw(nL,a)  refers  to  the  signal  w(n),  known  as  the  analysis  window.  In 
the  sliding -window  interpretation  [S]  of  (2.2),  the  time-reversed  analysis  window  w(-n)  shifts 
along  the  n-axis.  After  each  shift  of  L  samples,  w(-n)  is  multiplied  with  x (a );  each  product  is 
called  a  short-time  section  of  x  (a  ).  The  spectral  magnitude  of  the  short-time  section  for  a  partic¬ 
ular  window  shift  of  n qL  gives  the  frequency  variation  of  Sw  (nL  ,u>)  for  a  =n0.  The  extent  of 
any  particular  analysis  window  position  is  defined  as  the  region  outside  which  the  samples  of  the 
window  are  all  zero.  Then  the  overlap  of  two  analysis  windows  is  defined  as  the  intersection  of 
their  extents.  Note  that  when  L  has  minimum  value  1,  adjacent  analysis  window  positions  have 
maximum  overlap  for  the  allowable  positive  integer  values  of  L  .  In  this  case,  the  short-time  spec¬ 
tral  magnitude  is  said  to  be  computed  with  maximum  analysis  window  overlap .  Finally,  when 
L  >1,  the  short-time  spectral  magnitude  is  said  to  be  computed  with  partial  analysis  window  over¬ 
lap. 


If  there  were  a  unique  correspondence  between  signals  and  their  spectral  magnitudes,  the 
various  short-time  sections  of  x  (a  )  could  be  uniquely  determined  from  their  spectral  magnitudes 
in  Sw  (nL  ,(■>).  However,  the  theory  of  all-pass  spectral  transformations  [1]  tells  us  that  a  signal  is 
not  uniquely  specified  by  its  spectral  magnitude.  For  example,  x(n)  and  x(-a)  have  the  same 
spectral  magnitude.  More  generally,  when  any  poles  and  zeros  of  x(n)  are  replaced  by  the 
inverse  of  their  complex  conjugates,  a  signal  y  (n  )  is  obtained  which  has  the  same  spectral  magni¬ 
tude  as  x(a).  If  any  of  the  replaced  poles  and  zeros  is  not  on  the  unit  circle,  y(n)  is  different 
from  x(a).  Fortunately,  Sw(nL,ut)  has  additional  information  about  the  short-time  sections 
besides  their  spectral  magnitudes.  This  information  is  contained  in  the  overlap  of  the  analysis 
window  positions.  For  example,  if  one  of  the  short-time  sections  is  known,  then  the  signals 
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corresponding  to  the  spectral  magnitude  of  an  adjacent  section  have  to  be  consistent  in  the  region 
of  overlap  with  the  known  short-dme  section.  That  is,  the  two  sections  should  be  identical  in 
that  region  after  dividing  each  of  their  non-zero  samples  by  the  corresponding  samples  of  the 
analysis  window.  We  will  show  in  this  chapter  that  the  samples  in  the  region  of  overlap  can  be 
uniquely  extrapolated  to  obtain  the  entire  unknown  section. 

Suppose  Sw  ( nL  ,<•>)  is  computed  under  conditions  such  that  knowledge  of  any  short-time  sec¬ 
tion  leads  to  the  unique  extrapolation  of  its  neighboring  short-dme  sections.  Then,  knowledge  of 
just  one  particular  short-dme  section  triggers  a  series  of  extrapolations,  where  as  a  new  short- 
dme  section  is  extrapolated,  it  becomes  possible  to  extrapolate  a  succeeding  short-dme  section 
that  overlaps  the  one  just  extrapolated.  Once  all  the  short-time  sections  have  been  determined  in 
this  way,  the  final  step  is  to  combine  these  sections  for  obtaining  the  entire  signal.  Chapter  3 
uses  exactly  such  an  extrapolation  approach  to  determine  conditions  under  which  Sw(nL  ,tu)  is  a 
unique  signal  representation. 

From  the  above  discussion,  it  follows  that  the  major  theoretical  problem  in  establishing 
unique  correspondence  between  x(n)  and  Sw  (nL  ,<■»)  is  one  of  signal  extrapolation.  Specifically, 
we  wish  to  extrapolate  a  short-time  section  beyond  its  known  samples,  using  its  spectral  magni¬ 
tude.  If  the  analysis  window  has  finite  extent,  the  resulting  problem  is  equivalent  to  the  extrapo¬ 
lation  problem  considered  in  this  chapter. 

2.2  Single-Sample  Extrapolation 

Consider  a  discrete-time  signal  x (n )  that  is  zero  outside  the  interval  0s»  =s  N . 
Theorems  2.1  and  2.2  of  this  section  show  that  the  sample  x(<V)  can  be  uniquely  obtained  from 
the  spectral  magnitude  ,  JJC (<d) | ,  and  r(«)  for  0  s  »<\.  However,  the  two  theorems  differ 
from  each  other  in  the  number  of  samples  of  x(n)  and  the  number  of  samples  of  |Jf (w)  j  actu¬ 
ally  used  to  accomplish  the  extrapolation.  Compared  to  Theorem  2.2,  Theorem  2.1  requires 
fewer  samples  of  x(n).  On  the  other  hand,  Theorem  2.2  requires  fewer  samples  of  |X(u»)|  than 
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Theorem  2.1. 


Theorem  2.1 

Let  x(rt)  be  a  sequence  that  is  zero  outside  the  interval  OsnslV.  Suppose  x(0)  is  nonzero. 
Then,  2 N  or  more  samples  of  \X  («)  |  over  one  period  of  2ir  and  the  sample  x(0)  uniquely  speci¬ 
fy  the  sample  x  (N ). 


Proof: 

From  \X  (<■>)  |2  the  autocorrelation  function  R  (« )  of  x  (« )  is  obtained  through  the  inverse 
Fourier  transform. 

A(r r)  =  2  x(m)  x(/i+m)  (2.3) 

Since  x  (0)  is  the  first  non-zero  sample  of  x  (n  )  and  x  (n  )-0  for  n  >N ,  it  follows  that  (  see  Figure 
2.1) 

A(N)  =  x(0)x(N)  (2.4) 

Therefore,  since  x  (0)  is  assumed  known, 

x(N)  =  R(N)/x(0)  (2.5) 

Note  that  the  autocorrelation  value  R  (N)  is  the  only  information  derived  from  |X(ui)|.  Since, 

x(«)  is  N  +1  points  long,  R(n)  is  2S  +1  points  long  and  an  even  function  of  a.  Thus,  the  entire 
sequence  A  (a)  can  be  obtained  without  aliasing  with  a  2N+1  point  Inverse  Discrete  Fourier 
Transform  (IDFT)  of  pf(u)|2.  However,  with  a  2N  point  ED  FT,  the  sample  R(N)  will  be 
aliased  with  the  sample  R(-N-l).  Since  R (N)=R (-N),  it  follows  that  2 R(N)  can  be  obtained 
through  a  2N  point  IDFT,  requiring  only  2 N  uniformly  spaced  samples  of  \X  («)  |2.  This  com¬ 
pletes  the  proof  of  Theorem  2.1. 


0 


s 


x(n  —AT) 


y  2N 


R(N )  *  2  x(n>(n-iV)  *  x(0)*(iV) 

n  »— « 


Fig.  2.1  The  Computation  of  R(N) 
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The  next  theorem  also  concerns  single  sample  extrapolation  of  finite-length  signals.  How¬ 
ever,  it  uses  a  totally  different  approach  in  extrapolating  x(N)  from  the  preceding  samples  of 
x(«)  and  jX(<i»)|.  In  contrast  to  the  proof  of  Theorem  2.1,  the  values  of  |X(<u)|2  are  used 
directly  in  the  extrapolation  instead  of  first  obtaining  the  autocorrelation  R  (n)  of  x(a). 


Theorem  2.2 

Let  x(/i)  be  a  sequence  that  is  zero  outside  the  interval  0  £  n  £S  Assume  that  there  is  at  least 
one  non-zero  sample  of  x(«)  in  the  interval  Osa  <  jV  .  Then,  x(/t)  for  0  £  n  <  N  and  two  ap¬ 
propriately  chosen  samples  of  \X  («)  | ,  uniquely  specify  the  sample  x  (N ) . 


Proof: 


Lety(«)»x(ii)w(ii)  where  w  (n )  is  given  by 

jl  Os«<JV-l 

w(*)  m  \0  athxrwu*  (2-6> 

Let  y(«)  denote  die  spectrum  of  y(/t).  Then, 

X(»)  -  f(«)  +  x(N)  t (2.7) 
Taking  the  magnitude  squared  of  both  sides  and  rearranging  the  terms, 

x\N)  +  i(u)  X(N)  +  e(«)  -  0  (2.8) 

where: 

*(«)-2  Re  [7(«)  *'•"]  (2.9) 

e(«)«  |r(«)|2-  |X(«)|2  (2.10) 

Note  that  6(«)  and  c(m)  can  both  be  determined  from  [X(u)[  and  the  N-l  samples  preceding 

x(N).  When  (2.8)  is  solved  for  x(N),  there  are  two  solutions  for  each  value  of  u.  Consider  two 

distinct  values  of  m,  say  and  <*3,  in  the  interval  [0,  »].  Assume  that  at  least  one  of  b(w)  and 
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c(<u)  changes  when  ui  is  changed  from  <i>j  to  uq.  Then,  from  the  properties  of  quadratic  equa¬ 
tions,  the  two  solutions  associated  with  u»j  cannot  be  the  same  as  the  pair  of  solutions  for  <o2. 
However,  one  of  the  solutions  must  be  identical  and  that  is  the  true  value  of  x(N).  It  now 
remains  to  show  that  provided  the  N  —  1  values  pieces  ding  x  (a )  are  not  all  zero,  one  can  always 
find  and  u»2  for  which  two  different  quadratic  equations  are  obtained  from  (2.8). 

Two  values  of  u  giving  two  distinct  equations  from  (2.8)  can  be  found  if  b  (<u)  is  not 
independent  of  u.  Our  approach  here  will  be  to  show  that  b  (at)  is  independent  of  «  in  only  one 
case  -  when  the  N  samples  preceding  x(N)  are  all  zero.  The  sequence  y{n)  falls  in  the  region 
0s»  <  N .  Thus,  the  inverse  Fourier  transform  of  Y(u)eJuN ,  denoted  by  y(a),  falls  in  the 
region  —N  s  a<  0.  However,  for  b (<u)  not  to  depend  on  w,  the  Fourier  transform  of  y(n) 
must  have  a  constant  real  part.  That  is,  y(n)  must  be  of  the  form  A  8(a)  +  q(n)  where  A  is 
real,  8(a)  is  the  unit  sample  sequence,  and  q(n)  is  an  odd  sequence.  Therefore  b(«)  is  indepen¬ 
dent  of  <i>  only  when  y(n)=0  for  all  n,  i.e.,  the  N  values  preceeding  x(N)  are  all  zero.  In  this 
situation,  c  (oi)  from  (2.10)  is  also  independent  of  w.  Thus,  when  the  N  samples  of  x(n)  preced¬ 
ing  x(N)  are  all  zero,  this  is  the  only  situation  when  (2.8)  does  not  have  a  unique  solution  for 
x(N). 

hi  fact,  such  values  of  <u  can  b«  found  even  when  the  various  frequency  functions  are  sam¬ 
pled  at  the  rate  2v/M  where  M  22 N  —2.  In  such  a  case,  y  (a )  is  replaced  by 

y{*)  •  2  H*+pM)  (2.22) 

The  requirement  that  b(2if*IM )  be  independent  of  r  then  becomes  die  requirement  that  y  (a)  be 
an  odd  sequence.  It  can  be  verified  that  y  (a)  is  odd  if  and  only  if  y(a)* 0  for  all  n.  This  com¬ 
pletes  the  derivation. 

2.3  Multiple  Samples  Extrapolation 

This  section  presents  a  theorem  on  the  extrapolation  of  of  a  finite-length  sequence  x  (a ) 
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with  mote  than  one  unknown  sample,  using  the  spectral  magnitude,  |JT  («)  |.  Once  again,  x  (n  )  is 
asaumed  zero  outside  the  interval  0  s  a  s  N .  As  indicated  in  the  beginning  of  this  chapter,  the 
location  of  this  interval  on  the  n-axis  may  be  changed  without  affecting  the  results  derived  here. 

It  should  be  noted  that  theorem  2.3  below  uses  the  autocorrelation  function,  it  (a  ),  of  x  (a  ) 
to  determine  the  unknown  samples.  This  is  analogous  to  die  way  x(N)  was  determined  from 
R(n )  in  the  proof  of  theorem  2.1.  In  fact,  theorem  2.1  can  be  derived  as  a  corollary  of  theorem 
2.3.  However,  we  chose  not  to  do  this  in  order  to  emphasize  the  simplicity  of  die  direct  proof  of 
theorem  2.1. 

Theorem  2.3 

Let  x(n)  be  a  sequence  that  is  zero  outside  the  interval  Osn  sN.  Suppose  x(0)  is  non-zero. 
Then,  2N  or  more  samples  of  \X  (w)  |  over  one  period  of  2ir  and  the  P  samples  of  x  (a  )  in  the 
interval  Os  n  <  P  uniquely  specify  the  entire  sequence  x(n)  if  and  only  if  i1  a  fJ»/2 1 
(where  ht  -N +1  and  [a  1  is  the  smallest  integer  greater  or  equal  to  a). 

Proof: 

Throughout  this  proof,  the  samples  of  x(a)  for  0  s  « <  P  will  be  referred  to  as  the  inital  P 
samples  of  x(a).  We  first  provide  a  counter-example  to  show  that  if  P  <  lAf/2j,  then  x(a) 
cannot  in  general  be  uniquely  specified  by  \X  («)  |  and  die  initial  P  samples. 

With  P<lif/2j  consider  any  sequence  x(a)  such  that  x(a)=x(M  — 1— a)  for 
a=0,l,...P  —  1,  and  x(a)#x(Af  —  1— a)  for  a=P  f  +T,...,ilf  -P -1  (See  Figure  2.2).  Then,  die 
sequences  x(a)  and  y  (a)=x(Jlf  -1-a)  have  the  same  samples  for  a  =0,1, ...P  -1.  Furthermore, 
ance  y(a)  is  a  time-reversed  version  of  x(a),  the  two  sequences  have  the  same  Fourier 
transform  magnitude  (1J.  Since x(a)*x(M -1-a)  for  a=P,P+l,...,3/-P-l,y(a)  and 
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x(/t)  are  distinct  Thus,  the  initial  P  samples  and  |2T(<i>)  |  are  not  sufficient  to  uniquely  represent 
x(n). 

We  now  develop  a  procedure  for  uniquely  recovering  the  unknown  samples  of  x  (n )  when 
P a  \M .  From  IN  uniform  samples  of  \X (w)  |,  we  saw  in  the  proof  of  Theorem  2.1  that  the 
autocorrelation  R  (« )  of  x  (n )  can  be  obtained. 

R(it)  =  x(n)*x(-n)  =  2  x(m)x(#t+m)  (2.11) 

■ 

Consider  the  case  where  M  is  even.  From  (2.11),  M/2  linear  equations  are  obtained  in  MU  unk¬ 
nowns,  x(Af/2),x((Af/2)+l),...,x(M  -1).  In  matrix  form  these  equations  are: 


x(0) 

x  (M  —  1) 

(tf-1) 

x(l)  x(0) 

x(M  -2) 

R  ( M  -2) 

x(2)  x(l) 

■ 

= 

• 

x((M/ 2)-l)  x((if/2)-2)  .  .  .  x(0) 

x(M/2) 

R(M/ 2) 

The  left  matrix  is  lower  triangular  with  all  diagonal  elements  x  (0).  Since  x(0)*0  by  assumption, 
this  matrix  is  invertible.  Thus,  a  unique  solution  exists  for  x(n),  n  =A#/2,(Af/2)-*-l,...Af  -1.  For 
M  odd,  the  \M/2\  unknowns,  x((M  +l)/2),x(((Af  +l)/2)+l),...rx(M -1),  are  solved  for 
through  a  set  of  equations  similar  to  (2.12).  Thus,  for  P  2:  \M /2 ] ,  die  uniqueness  of  x(«)  fol¬ 
lows  regardless  of  whether  M  is  even  or  odd. 

The  one  remaining  case  is  when  M  is  odd  and  P  -  \M  H  J  •  In  this  case,  our  theorem  asserts 
that  a  unique  solution  for  x(n)  does  not  exist  To  show  this,  consider  the  sequence  x(n)  to  be 
such  that  x(«)=-x(H#  -1-n)  for  n  =0,1,.. .P  — 1,  and  x(P)*0.  Then,  the  sequences  x(n)  and 
y(n)=-x(M—l—n)  have  the  same  samples  for  n  =0,1, ...P  -1  and  y(P)=-x(P).  On  the  other 
hand,  it  is  easily  seen  that  |r(<u)|=  jJT(u)|.  This  completes  the  proof  of  Theorem  2.3. 


i 


-23- 


CHAPTER  THREE:  SIGNAL  REPRESENTATION  WITH 
SHORT-TIME  SPECTRAL  MAGNITUDE 

In  this  chapter,  we  address  the  problem  of  uniquely  representing  a  signal  by  its  short-time 
spectral  magnitude.  We  assume  that  the  analysis  window  of  the  short  time  spectrum  is  a  known 
finite-length  sequence.  This  permits  us  to  use  the  extrapolation  theorems  of  chapter  2  for 
developing  conditions  which  ensure  unique  correspondence  between  a  signal  and  its  short-time 
spectral  magnitude.  These  conditions  place  restrictions  on  the  finite-length  analysis  window  as 
well  as  die  signal  being  represented.  The  need  for  such  conditions  is  discussjd  in  section  3.1.  hi 
section  3.2  we  present  various  conditions  for  unique  signal  representation  with  the  short-time 
spectral  magnitude.  Most  of  these  conditions  concern  the  representation  of  one  -sided  signals. 
That  is,  signals  which  are  always  zero  either  before  (right-sided)  or  after  (left-sided)  some  point 
on  the  time  axis.  These  conditions  do  not  represent  all  the  possible  situations  in  which  a  signal  is 
uniquely  specified  by  its  short-time  spectral  magnitude.  However,  the  conditions  we  develop  are 
broad  enough  to  be  of  significant  practical  interest,  as  illustrated  in  later  chapters.  This  chapter 
doses  with  section  3.3  which  shows  how  the  uniqueness  conditions  can  be  easily  extended  to  the 
short-time  spectral  magnitude  of  multidimensional  signals. 


3.1  Uniqueness  Problems 

For  a  signal  x(n)  and  a  positive  integer  L ,  the  short-time  spectral  magnitude  is  given  by 

Sw (nL ,«)» |  2  x(m)w(nL  -m)e~^um  |2  (3.1) 

m  —  * 

where  the  subscript  w  in  Sw  ( nL  ,ut)  refers  to  the  analysis  window,  w  (n).  In  the  sliding  window 
interpretation  [5],  Sw(nL  ,io)  for  each  n  is  viewed  as  representing  the  spectral  magnitude  of  the 
diort-time  section  /„(m  )=x  (m  )w  (nL  -m).  When  L=  1,  the  short-time  spectral  magnitude  is 
mid  to  have  maximum  analysis  window  overlap.  On  the  other  hand,  if  L  >1,  the  short-time  spec- 
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tral  magnitude  has  partial  analysis  window  overlap.  In  this  section,  we  discuss  some  situations 
where  x  (* )  is  not  uniquely  represented  by  Sw  ( nL  ,«).  This  helps  us  select  the  conditions 
developed  in  the  next  two  sections  for  ensuring  unique  specification  of  x  (n  )  with  Sw  (nL  ,u>). 

At  least  one  condition  is  easily  shown  to  be  necessary  on  x  (n  )  for  unique  correspondence 
with  the  short-time  spectral  magnitude,  Sw(nL,ta).  In  expression  (3.1)  for  Sw (nL  ,u>),  when  x(n) 
is  replaced  by  -x(n),  the  minus  sign  is  absorbed  by  the  absolute  value  operation.  Thus,  x(n) 
and  -x  (n  )  have  the  same  short-time  spectral  magnitude.  This  ambiguity  may  be  resolved,  for 
example,  by  knowing  the  sign  of  some  non-zero  sample  of  x(n). 

In  the  case  of  a  finite-length  analysis  window,  a  gap  of  zero  samples  between  two  non- zero 
portions  of  x(n)  can  also  lead  to  ambiguity  in  signal  representation  with  Sw(nL,t»).  Suppose 
x(n)  is  the  sum  of  two  signals,  xj(/j  )  and  x2(n  ),  occupying  different  regions  of  the  n-axis  (  See 
Figure  3.1  ).  Suppose  that  the  gap  of  zeros  between  x  j(/i)  and  x2(n)  is  large  enough  so  that 
there  is  no  analysis  window  position  for  which  the  corresponding  short-time  section  includes 
non- zero  contribudon  from  x1(n)  as  well  as  x2(n ).  Clearly,  in  such  a  situation,  the  shon-time 
spectral  magnitude  of  x(n)  is  the  sum  of  the  short-time  spectral  magnitudes  of  x1(n)  and  x2(n). 
However,  we  previously  saw  that  a  signa.  and  its  negative  have  the  same  short-time  spectral 
magnitude.  It  follows  that  x(/i)  has  the  same  short-time  spectral  magnitude  as  the  signals 
obtained  from  the  differences  xj(fl)  -  x2(n)  and  x2(n)  -  xi(n)  (  See  Figure  3.1  ).  We  con- 
dude  that  if  there  is  a  large  enough  gap  of  zero  samples,  there  will  be  sign  ambiguities  on  ather 
side  of  the  gap.  Consequently,  all  the  uniqueness  conditions  developed  in  this  chapter  indude  a 
restriction  on  the  length  of  zero  gaps  between  non-zero  portions  of  the  signal. 

In  section  3.2  we  will  see  that  Sw(nL,u> )  with  L  =1  uniquely  specifies  a  one-sided  signal 
x (n )  under  conditions  whose  only  restriction  onx(n)  is  a  limit  on  the  size  of  any  zero  gaps.  The 
known  analysis  window  is  restricted  to  have  no  zero  samples  within  its  finite  length.  This  condi¬ 
tion  is  satisfied  by  commonly  used  rectangular,  triangular,  Hamming  and  Hanning  windows. 
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With  such  analysis  windows,  we  wiD  now  see  that  for  L  >  1  in  Sw(nL  ,w),  the  zero  gap  restric¬ 
tion  onx(s)  is  not  sufficient  to  guarantee  signal  specification  even  up  to  a  sign  ambiguity.  To 
show  this,  we  construct  a  class  of  sequences  that  have  no  zero  samples  between  any  two  nonzero 
samples;  these  sequences  have  the  property  that  they  are  not  specified  even  up  to  a  sign  factor  by 
Sw  (nL  ,u>)  with  L  >1  and  w  (n)  a  rectangular  window  whose  length  is  a  multiple  of  L . 

For  M  2:1  construct  M  sequences  xt(n),  i  =  \,2,...JbA  where  each  xf(st)  has  finite  length  L 
and  falls  in  the  region  1  s.n^L .  Furthermore,  constrain  the  z-transfonn  X,  (i)  of  each  x,(/i)  to 
have  Q  of  its  zeros  from  an  arbitrarily  specified  set  .  none  of  which  lie  on  the  unit 

circle  and  Q  <L  .  Thus,  for  each  i,  Xt  (z )  can  be  factored  as 

*,(*)  -  (3-2) 

Now,  let 

x(/i)=x1(/i)-*-x2(«  — L)+  ■  •  •  +xj|(/i  ~{M -1)L)  (3.3) 

Then,  the  z- transform  of  x  (n  )  is 

Jf(z)=Jir1(z)+z-1Jf2(z)+  •  •  •  +z-(*-l>txJ#(z) 

*1  ft  (1-ayZ  '^K^z  "^^+i(z)l  (3.4) 

/-i  /- o 

which  also  contains  the  zeros  aj  for  j=l,2,...,Q  . 

Now  consider  an  analysis  window  w  ^n)  defined  for  some  integer  r>l  by 

{1  <rL 
0  otherwise 

Observe  that  yk  (« )=x  (n  )w  x(kL  -n )  for  any  fixed  k  is  given  by 

("  )=xj (*~(j f  “  1)2-  )+*j  + 1(«  ~jL  )+  ■  ■  ■  +xp  («  ~{p  ~  1)2. )  (3  5) 

for  some  consecutive  integers  jj+1,  •  •  •  p  determined  from  the  set  {1,2,.  ..,M}.  Clearly,  the  z- 
transform  Yk (z)  of  each  yk (/i )  has  a;,a2,...,e^  among  its  zeros.  Thus,  if  one  or  more  of  the  a(’s 
is  reflected  about  the  unit  circle  to  1/a  *  then  |X  (<*>)  |  as  well  as  [T*  (u)  |  for  each  k  remains  the 
same  [1].  Thus,  there  are  2®  distinct  sequences  with  the  same  |r„(w)|-S„(ai  ,<■>).  Of  course, 
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from  those  2^  sequences  another  2^  sequences  with  the  same  Sw(nL  ,w)  are  obtained  by  forming 
the  negative  of  the  first  2®  sequences.  Thus,  there  are  2®  '*1  distinct  sequences  with  the  same 
Sw  (nL  ,(d).  Recall  the  maximum  attainable  Q  is  L  -1.  Also  note  that  each  xi  (n)  can  be  chosen 
to  guarantee  x(n)  is  non-zero  over  its  finite  length.  Thus,  with  L>  1,  a  class  of  sequences  with 
no  zeros  between  nonzero  samples  have  more  than  just  a  sign  ambiguity  in  their  representation 
with  Sw(hL  ,w).  For  example,  even  with  L  =2  their  exist  finite-length  sequences  with  no  zero 
samples  over  their  duration  such  that  there  is  an  ambiguity  of  22  =  4  in  the  representation  with 
Sw{nL,i»). 

We  have  established  that  for  unique  specification  of  x(n)  by  Sw[nL,<n)  '■nth  L  >  1,  we 
require  additional  information  on  x  (n )  besides  the  one-sided  and  zero  gap  restrictions.  In  section 
3.2.2,  knowledge  of  the  L  initial  samples  of  x(«t)  is  found  to  be  sufficient  for  this  purpose.  This 
condition  arises  naturally  from  the  extrapolation  approach  used  in  deriving  the  various  results  in 
the  remainder  of  this  chapter. 


3.2  Uniqueness  Conditions 

In  this  section,  we  present  various  conditions  and  their  derivations  for  uniquely  representing 
a  signal  with  its  short-time  spectral  magnitude.  The  analysis  window  of  the  short-time  spectrum 
is  assumed  to  be  a  known  finite-length  sequence.  The  uniqueness  conditions  presented  are 
sufficient  but  not  necessary  to  guarantee  unique  correspondence  between  a  signal  and  its  short- 
time  spectral  magnitude  These  conditions  are  divided  in  this  section  into  two  main  categories, 
according  to  whether  or  not  maximum  analysis  window  overlap  is  used  in  the  computation  of  the 
short-time  spectrum. 
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3.2.1  Maximum  Analysis  Window  Overlap 

The  short-time  spectral  magnitude  Sw(nL  ,o>),  defined  in  (3.1),  may  be  viewed  for  each  a 
as  the  spectral  magnitude  of  the  short-time  section  /„ (m)  =  x  (« )>v  (nL  —m ).  When  a  is  incre¬ 
mented  by  one,  the  time-reversed  analysis  window  w(nL-m)  shifts  L  sample  positions.  Since 
(3.1)  is  defined  for  positive  integer  values  of  L ,  it  is  dear  that  with  L  =1,  adjacent  analysis  win¬ 
dow  positions  have  maximum  overlap.  In  this  case,  we  denote  the  short-time  spectral  magnitude 
by  Sw  (a ,«). 

We  are  interested  in  developing  conditions  that  guarantee  unique  signal  representation  with 
Sw(n, o>)  when  th«j  analysis  window  is  a  known  finite-length  sequence.  For  this  purpose. 
Theorem  2.1  on  single  sample  extrapolation  of  finite-length  sequnces  is  extremely  useful.  For 
easy  reference,  we  restate  this  theorem  from  chapter  2. 

Theorem  2.1 

Let  x(a)  be  a  sequence  that  is  zero  outside  the  interval  Os«<JV.  Suppose  x(0)  is  nonzero. 
Then,  2N  ox  more  samples  of  \JC  (<o)  |  over  one  period  of  2ir  and  the  sample  x(0)  uniquely 
specify  the  sample  x(N). 

Although  the  theorem  is  stated  for  x  (a  )  in  the  interval  0  s  a  s  N ,  it  also  holds  for  x  (/i )  in 
any  other  interval  on  the  n-axis.  This  is  accomplished  by  a  change  of  reference  on  die  n-axis  such 
that  the  first  non-zero  sample  of  x  (a )  falls  at  the  origin  of  the  new  coordinate  system. 

We  now  state  our  first  set  of  conditions  for  uniquely  specifying  a  signal  x (a)  with  Sw  (a ,«). 
In  this  case  we  restrict  the  signal  x(a)  to  be  one-sided.  That  is,  x(a)=0  for  a<a'  or  a>n'  for 
some  integer  a' .  Of  course,  the  analysis  window  must  have  at  least  one  non-zero  sample  so  that 
S„ (a  ,b>)  is  not  zero  for  all  signals.  Furthermore,  we  restrict  w (a)  to  be  non-zero  over  its  finite 
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leagtfa,  Nw .  This  simplifies  the  type  of  restriction  imposed  on  x(#i)  for  avoiding  die  zero  gap 
ambiguities  discussed  in  section  3.1. 


Conditions  3.1  :  For  Representing  x(«)  Uniquely  With  $*,(11  ,u>) 

w(n):  a)  Known  sequence  of  finite  length  Nw 
b)  No  zeros  within  length  Nw 
x(n):  a)  One-sided 

b)  At  most  Nw  -2  consecutive  zero  samples  between  any  two  non-zero  samples 

c)  Sign  of  first  non-zero  sample  known 


To  show  that  Sw  (n  ,u>)  uniquely  specifies  the  signal  x(/i)  under  Conditions  3.1,  let  us  con¬ 
sider  the  case  when  the  analysis  window  w  (it )  is  restricted  to  the  interval  Os/i  <NW .  We  do  not 
lose  any  generality  with  this  assumption  because  it  can  be  easily  accounted  for  by  a  change  of 
reference  on  the  n-axis.  Under  Conditions  3.1,  we  now  show  a  procedure  for  recovering  x(n) 
from  Sw  (a  ,<u).  The  derivation  is  completed  by  showing  that  x(/i)  is  the  only  sequence  that  could 
have  been  obtained  from  Sw  (a  ,«)  under  Conditions  3.1. 

We  will  consider  only  the  case  with  x(a)  right-sided.  The  case  with  x(n)  left-sided  can  be 
proved  analogously.  Let  n'  be  the  smallest  value  of  n  such  that  x(n' )  is  non-zero.  Then,  with 
L  =1  in  (3.1)  and  w  (n )  as  assumed  above,  it  follows  that  Sw  (n  ,w)  is  zero  for  all  a  <  n' .  Furth¬ 
ermore, 


We  then  have 


Sw(n'  ,a>)  *  w2(0)  x^n’)  for  all  a> 


*("')«  ± 


VX(*'  .0) 

w(0) 


(3.6) 


(3.7) 
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The  sign  ambiguity  in  this  equation  can  be  resolved  since  Conditions  3.1  specify  the  sign  of  the 
first  non-zero  sample,  x(a' ).  Having  determined  x(a'  ),  the  next  step  is  to  use  Theorem  2. 1  for 
obtaining  x(n'  -1).  The  short-time  section  /„.  „i(m )  =  x  (m  )w  (a'  t1— m)  has  zero  samples  out¬ 
side  the  interval  n‘  SkiS/i'-I  and  all  its  samples  are  known  except  at  m  =n'  *1  where  it 
equals  x(a'  -**1)  w  (0).  The  spectral  magnitude  Sw  (a'  +  l,u>)  of  this  section  is  known.  Thus,  apply¬ 
ing  Theorem  2.1  with  N=2,  x(a'  -l)w(0)  can  be  extrapolated.  Since,  w(a)  was  a^>umed  known 
and  non-zero  over  OssSlV,,  we  divide  x(a'  -|-l)w(0)  by  tv  (0)  to  obtain  x(a'  -*-l).  We  now 
continue  such  a  procedure  to  determine  each  unknown  sample  of  x(a)  after  the  samples  preced¬ 
ing  it  have  been  determined.  However,  Theorem  3.1  requires  that  at  least  one  of  the  Nw  -1 
preceding  samples  be  non-zero.  This  recursive  procedure  for  determining  x(a)  for  n>#i'  can  be 
easily  expressed  in  dosed  form.  For  each  n,  let  ra(m)  denote  the  autocorrelation  function 
corresponding  to  Sw  (a  ,«).  The  autocorrelation  function  is  given  by 


r„(m)  *  2  x (k )w (a  —k )  x(k  —m )w («  —  (k  —m )) 


(3.8) 


Solving  this  equation  for  x  (a  ),  we  obtain 


H-l 


x(a) 


r„(m)  —  2  w(n-k)w(n-(k-m))x(k)  x(k-m) 

_ -(a-i) _ _ _ 


(3.9) 


tv  (0)w  (m  )x  (a  —m  ) 

This  is  a  valid  equation  only  for  values  of  m  for  which  w  (m  )x  (a  -m )  is  non-zero.  Since  w  (m ) 
is  non-zero  only  for  0  ,  we  require  that  x(a-m)  be  non-zero  for  some  at  in 

0  <  ai  <  Nw .  This  leads  to  the  requirement  that  x(a)  have  no  more  than  Nw  -2  zero  samples 
between  any  two  non-zero  samples.  This  is  consistent  with  our  observation  in  section  3.1  that 
there  should  not  be  a  zero  gap  separating  two  non-zero  portions  of  the  signal  such  that  no 
analysis  window  position  has  contributions  from  both  the  non-zero  portions.  Since  Conditions 
3.1  include  this  requirement,  it  follows  that  the  signal  x  (a )  can  be  obtained  from  Sw  (a  ,<*>)  using 


the  procedure  we  have  just  outlined. 

Suppose  there  is  another  signal  x  (a)  satisfying  Conditions  3.1  and  tor  which  the  sign  of 
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the  first  non-zero  sample  is  the  same  as  the  sign  of  the  first  non-zero  sample  of  x(r).  Since 
Sw(n  ,w)  has  its  first  non-zero  value  at  n  =n' ,  it  follows  that  x'  («' )  must  be  the  first  non-zero 

l  1 

value  of  x'  (a).  We  can  then  use  the  same  reconstruction  procedure  for  obtaining  x'  (n)  from  | 

Sw(n,n>)  as  we  used  for  obtaining  x(n)  from  Sw(n, u>).  However,  that  procedure  only  yielded 
one  answer.  It  follows  that  x(«)~x'  (n).  We  conclude  that  x(r)  is  uniquely  represented  by 
Sw  (n  ,w)  under  Conditions  3.1. 

I 

From  section  3.1,  we  know  that  -x(n)  has  the  same  Sw(nL,u> )  as  x(/i).  It  follows  that  j  j 

under  Conditions  3.1,  -x (n)  can  be  uniquely  obtained  from  Sw(n  ,w).  However,  the  only  differ¬ 
ence  in  obtaining  x(«)  and  -x (a)  using  the  procedure  outlined  above  is  that  different  signs  are 
selected  for  x(r'  )  in  (3.7).  It  fallows  that  without  the  a-priori  sign  knowledge  in  Conditions  3.1, 
x(/t)  could  have  been  obtained  up  to  a  sign  ambiguity  from  Sw  (n  ,u>). 

The  following  set  of  conditions  deal  with  the  sign  ambiguity  in  the  representation  with 
Sw  (<i ,»)  by  restricting  the  class  of  signals  under  consideration  to  be  non-negative.  In  this  case, 
the  sign  ambiguities  due  to  any  zero  gaps  also  disappear. 

Conditions  3.2  :  For  Representing  x (n )  Uniquely  With  Sw(n  ,u>) 

w(n):  a)  Known  sequence  of  finite  length  and  at  least 
one  nonzero  sample 
x(n):  a)  One  sided 

b)  Non-negative 


This  set  of  conditions  as  well  as  Conditions  3.1  restrict  x(n)  to  be  one  sided.  Lets  consider 


extending  the  class  of  signals  we  can  uniquely  specify  with  the  short-time  spectral  magnitude. 
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To  start  the  recursion  of  (3.9),  knowledge  of  -1  consecutive  samples  of  x(a)  is  suffi¬ 
cient,  provided  one  of  those  known  samples  is  non-zero.  Therefore,  the  requirement  that  x(n) 
be  one  sided  is  not  necessary.  Furthermore,  although  the  recursion  was  derived  for  increasing  n, 
a  omit**  procedure  can  be  derived  for  decreasing  n.  Using  these  observations,  the  following  con¬ 
ditions  for  unique  signal  representation  with  Sm  (it ,  w)  can  be  derived  that  apply  to  a  wider  class 
of  signals  than  Conditions  3.1. 

Conditions  3.3  ;  For  Representing  x (a )  Uniquely  With  Sw(n,(o) 

w(n):  a)  Known  sequence  of  finite  length  Nm 
b)  No  zeros  within  length  Nw 

x(n):  a)  Sw  -1  consecutive  samples  known,  at  least  one  of  which  is  nonzero 

b)  At  most  Nw  -2  consecutive  zero  samples  between  any  two  nonzero  samples 


3.2.2  Partial  Analysis  Window  Overlap 

We  will  now  develop  a  set  of  conditions  that  are  sufficient  for  uniquely  specifying  a  signal 
with  its  short-time  spectral  magnitude  which  is  computed  with  partial  analysis  window  overlap  ( 
i.e.  L  >1  in  Sm  (nL  ,<t>)  ).  The  signal  x  (a )  is  restricted  to  be  one-sided.  Furthermore,  the  analysis 
window  w  (n )  is  assumed  to  be  a  known  sequence  with  no  zero  samples  over  its  finite  length.  As 
shown  in  section  3.1,  even  if  we  do  not  allow  any  zero  samples  within  finite-length  x(n),  there 
are  signals  which  are  not  specified  even  up  to  a  sign  ambiguity  by  Sw(nL  ,w)  with  L>  1.  In  the 
conditions  below,  we  counter  those  ambiguities  with  knowledge  of  L  consecutive  samples  of  the 
agnal,  starting  from  the  first  non-zero  sample. 
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C traditions  3.4  :  For  Representing  x  (*  )  Uniquely  With  Sw(nL  ,«) 


In  the  above  conditions  fxl  denotes  the  smallest  integer  greater  or  equal  to  x.  The  derivation  of 
these  conditions  relies  heavily  on  Theorem  2.3  of  chapter  2,  restated  below  for  easy  reference. 

Theorem  2.3 

Let  x  (a )  be  a  sequence  that  is  zero  outside  the  interval  0  s  *  s  N .  Suppose  x  (0)  is  non-zero. 
Then,  2N  or  more  samples  of  |2f  (<■>)  |  in  an  interval  of  2n  and  the  P  samples  of  x  (a )  in  the  in¬ 
terval  0  s  n  <  P  uniquely  specify  the  entire  sequence  x(«)  if  and  only  if  P  2:  |Xf/2l  (  where 
M  =*N  +1  and  [al  is  the  smallest  integer  greater  or  equal  to  a). 

A x  indicated  in  chapter  2,  this  extrapolation  theorem  holds  regardless  of  the  position  of 
x(»)  on  the  n  axis.  Let  a'  be  the  anallest  n  for  which  x(*)*0.  Without  loss  of  generality, 
asnme  Is*'  sL  (See  Figure  3.2a).  Let  xL  («)  denote  a  sequence  which  equals  x(n)  for 
it '  s  n  <  *'  +L  and  is  zero  otherwise.  Thus,  xL  (* )  represents  the  L  known  initial  samples 
required  by  Conditions  3.4.  Without  lots  of  generality  we  assume  that  >v(«)  occupies  the  region 
0m* <NW .  Since  xt  (* )  is  known,  it  follows  that  x (* )  under  the  analysis  window  w(L-n)  is 


i 
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known.  The  first  objective  is  to  recover  any  unknown  samples  of  x(/i)  over  the  duration  of 
w(2 L-a). 

In  order  to  recover  the  unknown  samples  of  x(a)  under  w(2L  -a),  consider  the  sequence 
yj(e)=“x(a)w(2L  —  a)  illustrated  in  Figure  3.2b.  Since  P=zL  and L s  jiV^/2 J ,  knowledge  of 

*l(n)  assures  that  at  least  L  samples  of  >j(n)  beginning  at  a  —n'  are  known.  Furthermore,  the 
length  of  *2(11)  is  2 L  -a'  +1  and  La  \(2 L  -a'  +l)/2l.  Therefore,  applying  Theorem  2.3,  the 
unknown  samples  of  y2(* )  uniquely  determined  by  Sw(nL,m)  and  the  initial  conditions 
x£(e).  Since  w(a)  is  nonzero  over  its  duration,  the  unknown  values  of  x(a)  under  w(2L  -a)  are 
obtained  by  division. 

We  have  now  determined  x  (a  )  up  to  a  =2L .  We  will  next  show  that  if  x  (a  )  is  known  up  to 
n  =(*'  —  1)L ,  then  x(a)  is  uniquely  determined  up  to  n=k'L  under  Conditions  3.4.  By 
inductions  (a)  is  then  uniquely  determined  for  all  a  aa' . 

Consider  the  short-time  segment  yk,  (n)=x(n)w(k'L  —n)  for  a  particular  k  =k' .  Suppose 
further  that  x(a)  is  known  up  to  the  last  sample  of  w ((*'  -1)L  -a),  that  is,  up  to  n-{k‘  -1)L  . 
Then  beginning  at  n=k'L-Nw  + 1  (See  Figure  3.3),  Nw-L  consecutive  samples  of  (a )  are 
known.  The  next  objective  is  to  recover  the  last  L  samples  from  the  first  Nw-L  samples  of 
yk.  (a  ).  Clearly,  the  ability  to  do  so  depends  on  the  value  of  L . 

Suppose  L>|a^w/2|.  Then  N -L<  |n„/2  j.  Consequently,  from  Theorem  2.3,  the  unk¬ 
nown  L  samples  of  yk.  (a )  are  not  uniquely  specified  by  Sw (k'L  ,u). 

Suppose  1  <Ls  |Arw/2 J.  Furthermore,  suppose  that  the  initial  value  yk, (k'L —Nm  +1)  is 
non-zero.  The  Nw  -L  values  of  yk,  (a )  starting  from  n=k'L-Nw+\  are  known.  Since 
N  - L2  \/2 1  and  Sw(k'L  ,w)  is  known,  yk,  (a)  is  completely  determined  by  using  Theorem  2.3. 
Now  consider  the  cases  when  the  first  non-zero  value  of  yk,  (a)  occurs  beyond  a  =k'L  -Nw  +1.  In 
particular,  suppose  that  there  are  at  most  J  consecutive  zeros  in  yk>  (a)  starting  at  a  =k'L  ~NW  +1 
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w  (2 L  —  n  ) 

w(L-»)  — — — — | 

r - T 


•  Known  value 


o  Unknown  value 


P - 


2 L  —n'  +1 


Fig.  3.2  Sequences  for  Proof  of  Conditions  3.4 


yk,(k'L-N  + 1)' 


•  Known  value 
o  Unknown  value 


i 


Fig.  3.3  Sequences  for  Proof  of  Conditions  3.4 
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(See  Figure  3.3c).  Lets  find  the  largest  J  for  which  the  L  unknown  samples  of  yk>  (n)  can  be 
determined.  Theorem  2.3  requires  at  least  L  known  samples  preceeding  the  L  unknown  samples. 
Thus,  the  maximum  allowable  value  of  J  is  Nw  -[L  JrL\=Nw  -2L .  This  is  consistent  with  Condi¬ 
tions  3.4. 

We  have  shown  that  x(n )  can  be  uniquely  determined  from  Sw  {nL  ,t»)  under  Conditions 
3.4.  Suppose  another  signal  x'  (a)  also  had  the  short-time  spectrum  Sw(nL  ,w)  and  satisfied  Con¬ 
ditions  3.4  with  the  same  initial  L  known  samples  as  x(a).  Applying  the  procedure  outlined 
above,  we  obtain  x'  (a).  However  ,  the  procedure  is  identical  to  the  one  used  for  obtaining 
x(n).  Since  the  procedure  gives  a  unique  answer,  it  follows  that  x‘  (a)-x(a).  Thus,  under  Con¬ 
ditions  3.4,  a  signal  is  uniquely  represented  by  its  short-time  spectral  magnitude. 


3.3  Multidimensional  Extension 

This  section  extends  signal  representation  with  short-time  spectral  magnitude  to  multidi¬ 
mensional  discrete-time  signals.  Since  the  extension  is  conceptually  straightforward  but  notation- 
ally  cumbersome,  it  will  be  presented  here  only  for  the  short-time  spectral  magnitude  with  max¬ 
imum  analysis  window  overlap  and  for  two-dimensional  signals  with  finite  support.  For  a  two 
dimensional  signal  x(n,m),  the  short-time  spectral  magnitude  is  given  by 
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5w(y«,«;w,v)=|  2  2  x(mum^)w{n-m  ltm -m2)t 

where  w  («  ,m )  is  the  two  dimensional  analysis  window. 

Let  [jc  (n  ,m  )]Ar  represent  the  dass  of  two-dimensional  signals  whose  finite  regions  of  sup¬ 
port  contain  no  blocks  of  zeros  larger  than  (N-2)  x  (N-2).  This  is  a  generalization  of  the  one¬ 
dimensional  condition  of  finite  length  with  no  gaps  of  more  than  N-2  zeros  within  the  length. 
Then,  the  following  conditions  are  suffirient  for  reconstruction  up  to  a  sign  ambiguity. 


w(n,m):  a)  Non-zero  over  its  NxN  rectangular  support 
x(n,m):  a)  Belongs  to  [jr(n  ,m  )]w 

b)  Sign  of  one  non-zero  sample  known 


The  derivation  of  these  conditions  is  analogous  to  the  ones  used  for  one  dimensional  sig¬ 
nals.  In  particular,  a  sequential  reconstruction  procedures  can  be  easily  designed  in  a  manner 
similar  to  the  sequential  extrapolation  procedures  based  on  the  theorems  in  chapter  2.  One  such 
procedure  for  obtaining  x(n  ,m)  proceeds  along  successive  columns  (rows).  Suppose,  in  particu¬ 
lar,  that  x(n  ,m)  has  been  computed  up  to  the  (k-lth)  column  and  (r-l)th  row  (See  Figure  3.4a). 
Then  the  next  value  can  be  determined  from  the  autocorrelation  of  the  region  shown  in  the  box 
along  with  all  the  known  samples  within  the  box  in  a  manner  analogous  to  that  for  one¬ 
dimensional  signals.  An  alternative  method  of  computation  proceeds  along  successive  lines  of 
the  form  m~—n  *n'  for  some  constant  n' .  This  approach  is  illustrated  in  Figure  3.4b. 
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CHAPTER  FOUR:  SIGNAL  RECONSTRUCTION  FROM 
SHORT-TIME  SPECTRAL  MAGNITUDE 

We  have  established  a  number  of  conditions  under  which  a  signal  is  uniquely  represented 
by  its  short-time  spectral  magnitude.  However,  for  such  a  signal  representation  to  be  practical, 
we  need  techniques  that  reconstruct  a  signal  from  its  short-time  spectral  magnitude.  In  this 
chapter,  we  develop  such  techniques,  particularly  for  reconstructing  finite-length  signals  because 
of  their  importance  in  practical  applications,  hi  chapter  3,  we  introduced  one  such  technique 
while  developing  conditions  for  unique  signal  correspondence  with  the  short-time  spectral  magni¬ 
tude.  That  technique  belongs  to  a  more  general  class  of  techniques  described  in  section  4.1 
which  reconstruct  the  short-time  sections  of  a  signal  in  an  order  determined  by  their  positions  on 
the  time  axis.  We  call  this  he  sequential  extrapolation  approach. 

The  main  characteristic  of  sequential  extrapolation  techniques  is  that  they  extrapolate  each 
short-time  section  using  only  its  own  spectral  magnitude.  A  number  of  theorems  were  presented 
in  chapter  2  for  such  extrapolation  and  used  in  the  reconstruction  procedure  of  chapter  3.  How¬ 
ever,  in  those  cases  only  a  portion  of  the  known  information  was  used  to  perform  the  extrapola¬ 
tion.  In  sections  4.2  and  4.3,  we  consider  techniques  which  use  more  of  the  known  information. 
In  particular,  we  develop  techniques  which  require  the  extrapolated  short-time  section  to  match 
the  entire  known  information  using  various  error  criteria.  This  is  particularly  useful  when  the 
known  information  is  not  exact.  For  example,  we  will  see  in  section  4.4  that  reconstruction  tech¬ 
niques  of  this  chapter  are  less  r-msitive  to  round-off  errors  when  compared  to  the  extrapolation 
procedures  of  chapter  2.  Furthermore,  in  later  chapters,  we  will  see  that  these  tectuuques  give 
better  signal  reconstructions  when  the  short-time  spectral  magnitude  is  purposely  modified  for 
accomplishing  signal  processing  tasks  such  as  noise  reduction  and  time-scale  modification  of 
speech. 

The  final  section  of  this  chapter  presents  an  alternative  reconstruction  approach  that  is 
referred  to  as  simultaneous  extrapolation  .  Rather  than  matching  the  known  information  for  each 
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short-time  section  individually,  the  idea  is  to  choose  all  the  unknown  samples  in  a  way  that 
minimizes  an  error  criterion  defined  over  the  entire  short-time  spectral  magnitude.  However,  the 
resulting  algorithms  require  the  simultaneous  solution  of  as  many  equations  as  there  are  samples 
to  be  reconstructed.  This  becomes  computationally  prohibitive  even  for  average  length  signals  in 
various  applications.  For  reducing  this  computational  complexity,  one  may  consider  extrapolat¬ 
ing  several  short-time  sections  simultaneously,  but  extrapolating  each  such  group  in  sequential 
order.  Such  techniques  have  not  been  implemented  in  this  thesis.  However,  they  are  expected  to 
perform  even  better  than  the  sequential  techniques  of  this  chapter  when  applied  to  the  time-scale 
modification  and  noise  reduction  applications  of  chapters  6  and  7. 


4.1  The  Sequential  Extrapolation  Approach 

The  short-time  spectral  magnitude  of  a  signal  x  (n  )  for  a  positive  integer  L  and  an  analysis 
window  w  (#i )  is  given  by 

Sy,  (nL  ,oi)  =  |  2  x(m)w(nL -m  |2  (4.1) 

We  assume  that  w(/i)  is  a  known  sequence  with  no  zero  samples  over  its  finite-length,  Nw . 
Furthermore,  these  nonzero  samples  are  in  the  region  Osn  <  Nm.  The  signal  x  (« )  finite- 
length  with  no  more  than  Nw  -2 L  consecutive  zeros  separating  any  two  nonzero  samples  for 
L  >1.  If  L- 1,  at  most  Nw-  2  consecutive  zeros  are  allowed  between  two  nonzero  samples  of 
x(n).  It  is  also  assumed  that  the  first  nonzero  sample  of  x(n)  falls  at  n  =0.  Finally, we  assume 
that  the  L  samples  of  x(n)  for  0  s  n  <  L  are  known.  These  assumptions  are  necessary  for  all 
the  algorithms  described  in  this  chapter. 

The  sequential  extrapolation  approach  to  signal  reconstruction  from  short-time  spectral 
magnitude  is  illustrated  in  Figure  4.1.  The  L  known  samples  of  x(n)  completely  determine  the 
short-time  section  corresponding  to  Sw(nL  ,u)  for  n  =1.  The  short-time  section  corresponding  to 
Sw(nL  ,u>)  for  n  =2  can  then  be  extrapolated  from  its  spectral  magnitude  and  its  known  samples 
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x(«)  :  Finite  Length  with  x(0)  the  first 
nonzero  sample 

w  (n )  :  Ncn-zero  over  0s#«  <NW 


Fig.  -  Sequential  Extrapolation  Approach 
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in  the  region  of  overlap  with  the  previously  determined  short-time  section.  This  process  continues 
as  the  complete  extrapolation  of  each  new  short-time  section  makes  possible  the  extrapolation  of 
the  next  overlapping  short-time  section.  The  reconstruction  stops  when  a  short-dme  section  is 
encountered  for  which  the  known  samples  are  not  sufficient  to  complete  the  extrapolation.  For 
the  conditions  outlined  at  the  beginning  of  this  section  we  know  from  chapter  3,  that  the  recon¬ 
struction  stops  only  after  all  the  non-zero  short-time  sections  have  been  extrapolated.  Further¬ 
more,  since  the  analysis  window  is  non-zero  over  the  length  of  each  short-time  section,  dividing 
the  short-time  sections  by  the  analysis  window  yields  the  required  samples  oi  the  signal  x(n ). 

The  techniques  used  by  the  proofs  of  the  various  theorems  in  chapter  2  can  be  used  for 
accomplishing  the  extrapolation  step  of  the  sequential  extrapolation  approach.  For  example,  in 
section  3.1  of  chapter  3  we  used  techniques  in  the  proofs  of  Theorems  2.1  and  2.3  for  the  extra¬ 
polation.  In  this  section  we  apply  the  technique  of  the  proof  of  Theorem  2.2  to  the  extrapolation 
step.  In  that  theorem  we  saw  that  the  last  sample  of  a  finite-length  signal  (  i.e.  a  sample  after 
which  the  signal  is  always  zero)  can  be  extrapolated  from  the  preceding  samples  and  two 
appropriately  chosen  samples  of  the  spectral  magnitude.  In  the  proof  of  that  theorem,  it  was 
also  shown  that  the  two  appropriate  samples  of  the  spectral  magnitude  can  be  found  even  if  the 
spectral  magnitude  is  uniformly  sampled  in  frequency  with  a  rate  greater  than  2ir/(2iV  -3). 

For  each  n ,  let  the  short-dme  section  of  x  (n )  whose  last  sample  falls  at  n  be  denoted  by 
/„  (m).  IS  the  sample  of  fM  (m)  at  m  ~n  is  replaced  by  zero,  the  resulting  sequence  is  denoted  by 
g„  (m )  and  its  spectrum  is  denoted  by  G„(u>).  Then,  Theorem  2.2  solves  for  the  sample  x(n) 
through  the  quadratic  equation: 


x2(«)  +  h(«  ,<u)x(/t)  +  c(n,w)  =  0  (4.2) 

where 

b  (a  ,ui)  =  2Xe  [G, *N  _1)]  (4.3) 

and 
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c(n,u>)  =  |G.(«)|2-Jw(*,«)  (4.4) 

Observe  that  this  technique  uses  only  two  frequency  samples  of  Sw  (a  ,<u)  for  each  value  of 
n.  With  speech  waveforms  we  have  found  that  any  arbitrary  selection  of  the  two  frequency 
values  generally  yields  two  distinct  quadratic  equations.  In  particular,  to  reduce  the  computa¬ 
tional  load,  a  good  choice  is  w=0  and  «j=ir.  In  this  case, (assume  M  even) 

|G-(2wr/4OU0-2j.(«)  (4-5) 

m 

|C,(2trr/^)i„V/2  =  2(-i)m#«(*)  (4.6) 

m 

If  the  analysis  window  is  rectangular,  the  computational  load  can  be  reduced  further  because 
G„  (w)  can  be  computed  recursively, 

G„  («)=G._1(to)e  (/.  -N„  +  IV  +1)*x(n  -  l)e  (4.7) 

We  still  have  to  address  the  problem  of  synthesizing  the  entire  reconstructed  signal  from  its 
short-time  sections.  We  have  assumed  that  the  analysis  window  is  non-zero  over  its  length  Nw .  It 
follows  that  we  can  divide  each  short-time  section  by  the  analysis  window  to  obtain  the 
corresponding  samples  of  the  reconstructed  signal.  Alternatively,  we  can  select  the  analysis  win¬ 
dow  w(n)  such  that 

2  w(nL-m)  =  1.0  for  all  m  (4.8) 

<•«-* 

In  such  a  case,  the  entire  signal  can  be  reconstructed  by  simply  adding  all  its  short-time  sections. 


4.2  Least-Sqaares  Sequential  Extrapolation 

In  this  section  we  develop  a  least-squares  technique  for  the  short-time  extrapolation  step  of 
the  sequential  signal  reconstruction  procedure  of  the  previous  section  (  See  Figure  4.1  ).  The 
major  idea  here  is  to  use  more  information  from  the  short-time  spectral  magnitude  than  is  strictly 
necessary  to  reconstruct  the  signal.  This  malces  the  reconstruction  algorithm  more  robust  to 
errors  in  the  short-time  spectral  magnitude  as  will  be  seen  in  section  4.4  and  later  chapters.  The 
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analysis  window  of  the  short-time  spectrum  and  therefore  the  corresponding  short-time  sections 
of  the  signal  are  assumed  to  be  finite-length.  From  section  4.1,  each  short-time  section  is  extra¬ 
polated  from  a  set  of  its  known  samples  and  the  spectral  magnitude  of  the  section. 

Let  f  (n)  be  the  short-time  section  being  extrapolated.  For  simplifying  notation,  we  assume 
that  /( 0)  is  the  first  non-zero  sample  of  /(*).  However,  the  technique  developed  here  is  not 
affected  by  the  particular  location  of  the  first  non-zero  sample.  Assume  that  the  analysis  window 
is  Nw  points  long  and  thus  f{n)  is  known  to  be  zero  for  n  >  Nw.  In  the  sequential  extrapolation 
approach,  as  outlined  in  Figure  4.1,  the  known  samples  of  /(n)  are  in  the  range  0  s*i  <  M 
where  M  2JVw/2forW,  even,  and  M  i  (Nm  -1)  /  2  for  Nw  odd.  The  problem  is  to  extrapo¬ 
late  the  unknown  samples  of  f  (n)  in  the  range  M  s/r  <  Nw .  For  this  we  use  a  least-squares 
algorithm  that  minimizes 

E  -  2  (r(m)-s(m))2  (4.9) 

where  r{m)  is  the  autocorrelation  function  obtained  by  talcing  the  inverse  Fourier  transform  of 
the  squared  spectral  magnitude  of  f(n).  The  function  j(m)  represents  the  inverse  Fourier 
transform  of  the  squared  spectral  magnitude  of  the  reconstructed  /(«).  By  Parseval’s  Theorem 
[1],  minimizing  the  above  expression  is  equivalent  to  minimizing  the  integral  over  the  squared 
difference  between  the  squared  spectral  magnitude  of  /  (/t )  and  the  squared  spectral  magnitude 
of  the  reconstructed  version  of  /(h).  Both  r(m )  and  s(m)  are  autocorrelation  functions  of  real 
sequences  that  are  at  most  N„  samples  long.  It  follows  that  r(m)  and  s  (m  )  are  even  sequences  of 
maximum  duration  2NW—1.  Under  such  conditions,  the  minimization  of  (2.12)  is  equivalent  to 

minimizing 

*-“l 

E  -  2  ('■(»)-■» ("»))2  (4.10) 

*  “0 

To  minimize  E,  we  set  its  derivative  with  respect  to  the  unknown  samples  of  /  (n )  to  zero.  If 
there  are  L  unknown  samples,  this  procedure  yields  a  system  of  L  simultaneous  cubic  equations 
in  the  L  unknowns.  For  example,  if  /  (Nw  -1)  is  the  only  unknown,  we  get  the  following  cubic 
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equation: 

i  *-'1 

2f\Nw-l)-(2r{Q)-lt$))f{Nw-\)-  £  {Nw -l-m)=0  (4.11) 

m  *1 

where  t  (m )  is  the  autocorrelation  of  the  sequence  obtained  from  /  (/t )  by  setting  /  (Nw  -1)  equal 
to  zero.  Generally,  this  equation  will  have  two  complex  conjugate  roots  and  one  real  root.  If  the 
signal  being  reconstructed  is  known  to  be  real,  we  dearly  select  the  real  root. 

For  situations  with  more  than  one  unknown  sample,  the  system  of  simultaneous  cubic  equa¬ 
tions  is  difficult  to  solve.  One  possible  approach  to  simplify  the  equations  is  to  neglect  some  of 
the  terms  in  (4.10).  If  there  are  L  unknowns  and  we  neglect  the  terms  for  0 s  m  <  L  ,  we  obtain 
a  set  of  L  simultaneous  linear  equations  in  the  L  unknowns.  For  example,  if  L  =1  we  obtain  the 
following  linear  equation  for  f{Nw  -1). 

Af.-l 

2  (r  ("* )-' («))/(*„ -l-«) 

/(*»-!)= — -1— fr-rj -  (4.12) 

2 /^-l-m) 

m  *1 

where  t  (m  )  is  the  autocorrelation  of  the  sequence  obtained  from  /  («  )  by  setting  /  (Nw  - 1)  equal 
to  zero. 


4.3  Iterative  Sequential  Extrapolation 

In  this  section  we  develop  an  iterative  technique  for  extrapolating  a  finite-length  sequence 
from  certain  of  its  known  samples  and  the  spectral  magnitude  of  the  sequence.  This  procedure 
can  be  used  for  the  extrapolation  of  each  short-time  section  in  the  sequential  extrapolation  tech¬ 
nique  (  See  Figure  4.1  )  for  signal  reconstruction  from  short-time  spectral  magnitude.  As  in  the 
least-squares  technique,  the  main  idea  here  is  to  develop  a  reconstruction  algorithm  that  uses 
more  information  than  is  strictly  necessary  to  reconstruct  the  signal.  In  the  following  section  as 
well  as  in  later  chapters,  this  algorithm  proves  to  be  very  robust  to  errors  in  the  short-time  spec¬ 
tral  magnitude  information. 
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Following  the  notation  of  section  4.2,  let  /  (a )  be  the  short-time  section  being  extrapolated. 
For  simplifying  notation,  we  assume  that/(0)  is  the  first  non-zero  sample  of  f{n).  Assuming 
the  analysis  window  is  SM  samples  long,  /  (a  )  is  known  to  be  zero  for  a  a  Sw .  In  the  sequen¬ 
tial  extrapolation  approach,  as  outlined  in  Figure  4.1,  the  known  samples  of  /(a)  are  in  the 
range  0  sa  <  M  where  M  a  Sw  /  2  for  Nm  even  and  U  =s  (N„  -1)  /  2  for  Nm  odd.  In  this  sec- 
don,  we  present  an  iterative  technique  that  goes  back  and  forth  between  the  time  and  frequency 
domains,  imposing  die  known  constraints  in  each  domain  (  See  Figure  4.2  ).  The  constraints 
imposed  in  the  time  domain  are  all  the  known  samples  of  /  (a  )  outside  the  region  M  n  <  Sw. 
On  the  other  hand,  in  the  frequency  domain  we  impose  the  known  spectral  magnitude  of  /  (a  ). 
The  goal  is  to  have  the  technique  converge  to  the  correct  answer  for  the  unknown  samples  of 
/(a  )  in  the  region  M  a:  a <  Nm. 

The  problem  of  mathematically  showing  whether  or  not  the  iterative  procedure  outlined  in 
Figure  4.2  converges  to  any  kind  of  answer  has  not  been  addressed  in  this  thesis.  However,  we 
have  empixicaily  observed  that  the  procedure  appears  to  converge  to  the  correct  answer  in  many 
cases.  In  other  instances,  however,  the  procedure  does  appear  to  converge  but  not  to  the  samples 
we  seek.  In  section  4.4  we  win  see  that  for  signal  reconstruction  from  short-time  spectral  magni¬ 
tude,  the  failiure  to  converge  to  the  right  answer  in  some  of  the  short-time  sections  leads  to  a 
reconstructed  signal  quite  different  from  the  original  signal.  On  the  other  hand,  for  speech  sig¬ 
nals,  the  reconstruction  is  quite  successful  in  retaining  most  of  the  perceptual  quality  of  the  origi¬ 
nal  signal. 

4.4  Reconstruction  Examples 

This  section  presents  results  of  experiments  conducted  to  test  the  reconstruction  algorithms 
of  this  chapter  on  speech.  In  particular,  we  have  tested  the  algorithms  on  the  short-time  spectral 
magnitude  of  the  speech  waveform  in  Figure  4.3.  This  waveform  corresponds  to  the  sentence 
The  bowl  dropped  from  his  hand”,  spoken  by  a  female  speaker.  The  processing  was  carried  out 
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Fig.  4.2  Iterative  Extrapolation 
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on  a  PDP  11/50  with  floating  point  arithmetic.  For  this  processing,  the  waveform  is  sampled  at 
10kHz  and  the  sampling  quantization  rate  is  12  bits. 

In  the  first  experiment,  the  goal  is  to  reconstruct  the  signal  from  Sw  (n  ,u>)  using  the  sequen¬ 
tial  extrapolation  approach  based  on  the  proof  of  Theorem  2.1.  Specifically,  the  one  unknown 
sample  in  each  short-time  section  is  solved  for  by  using  just  one  sample  from  the  autocorrelation 
of  the  same  short-time  section.  This  approach  was  applied  with  rectangular  as  well  as  Hamming 
analysis  windows  of  various  lengths.  Using  double  precision  (64  bits)  floating  point  computa¬ 
tion,  the  reconstruction  was  successful  to  within  the  12  bit  precision  of  the  original  speech  signal 
of  Figure  4.3.  For  the  case  of  a  rectangular  window  of  32  points,  the  reconstruction  from 
Sw{n  ,u>)  is  shown  in  Figure  4.4.  Signal  reconstruction  was  also  successfully  accomplished  for  the 
cases  when  the  analysis  window  spacing  L  was  slightly  larger  than  unity.  In  these  cases,  we 
applied  the  sequential  extrapolation  procedure  based  on  the  proof  of  Theorem  2.3  of  chapter  2. 
However,  when  the  analysis  window  overlap  was  greater  than  4,  this  reconstruction  algorithm 
failed  very  early  in  the  signal.  The  failiure  appears  to  occur  due  to  computational  errors  that 
arise  because  of  successive  divisions  by  very  small  signal  values  within  a  short-time  section. 

For  larger  analysis  window  spacing,  we  next  tried  signal  reconstruction  using  the  linear  ver¬ 
sion  of  the  sequential  least-squares  technique  of  section  4.2.  The  analysis  window  is  a  128-point 
rectangular  window.  Using  double  precision,  the  computation  was  quite  successful  for  window 
spacings  up  to  L  =30.  For  example,  the  reconstruction  for  L  =20  is  shown  in  Figure  4.5.  How¬ 
ever,  as  L  approaches  N72=64,  there  are  not  too  many  extra  autocorrelation  coefficients  to  make 
die  computation  robust.  The  algorithm  therefore  fails  for  values  of  L  much  higher  than  40. 

Finally,  we  applied  the  sequential  iterative  algorithm  to  reconstruct  signals  with  large 
analysis  window  spacing.  As  indicated  in  section  4.3,  this  algorithm  does  not  reconstruct  the  sig¬ 
nal  exactly.  However,  for  speech  applications  the  signal  obtained  from  the  iterative  algorithm  is 
perceptually  close  to  the  original  signal.  For  example,  Figure  4.6  shows  the  reconstruction  of  the 
speech  in  Figure  4.3  using  the  iterative  reconstruction  algorithm. 


Fig.  4.3  Test  Speech  Waveform 


Fig.  4.5  Sequential  Least-Squares  Reconstruction 


Fig.  4.6  Sequential  Iterative  Reconstruction 


The  analysis  window  is  a  rectangular  window  of  128  points  and  the  window  spacing  L  is  64. 

4.5  Simultaneous  Extrapolation  Approach 

The  emphasis  in  this  thesis  is  on  the  sequential  extrapolation  algorithms  of  the  previous  sec¬ 
tions  for  signal  reconstruction  from  short-time  spectral  magnitude.  However,  other  approaches 
can  be  designed  for  reconstructing  a  signal  from  its  short-time  spectral  magnitude.  In  this  section, 
we  outline  an  approach  which  we  refer  to  as  simultaneous  extrapolation .  The  main  idea  in  this 
approach  is  to  use  the  spectral  magnitudes  of  several  (  possibly  all  )  short-time  sections  for  deter¬ 
mining  their  unknown  samples  simultaneously.  This  is  in  contrast  to  the  sequential  extrapolation 
approach  where  each  short-time  section  is  extrapolated  only  on  the  basis  of  its  own  spectral  mag¬ 
nitude.  Of  course,  we  have  seen  that  the  spectral  magnitude  of  just  the  one  section  is  sufficient  to 
uniquely  extrapolate  the  section  under  conditions  we  have  been  assuming  in  this  chapter.  How¬ 
ever,  in  case  of  errors  or  purposeful  modifications  in  the  short-time  spectral  magnitude,  we  have 
seen  previously  that  it  is  useful  to  incorporate  extra  information  in  the  reconstruction  procedures. 
For  example,  the  least-squares  and  iterative  techniques  of  the  previous  section  used  much  more 
of  the  spectral  magnitude  of  each  short-time  section  than  the  techniques  based  on  the  proofs  of 
the  theorems  in  chapter  2.  In  the  simultaneous  extrapolation  approach,  we  also  wish  to  incor¬ 
porate  the  spectral  magnitude  information  on  other  short-time  sections  in  the  extrapolation  of 
any  particular  short-time  section. 

We  will  illustrate  the  simultaneous  extrapolation  approach  by  developing  it  as  an  extension 
to  the  least-squares  technique  of  section  4.2.  The  problem  is  to  reconstruct  a  finite-length  signal 
x(n)  from  its  short-time  spectral  magnitude,  Sw  (nL  ,<u),  under  the  conditions  developed  in 
chapter  three.  In  section  4.2,  we  showed  that  a  set  of  L  equations  can  be  developed  far  L  unk¬ 
nowns  in  each  short-time-section  using  least-squares  error  criteria.  These  equations  were  either 
cubic  or  linear  according  to  the  particular  error  criterion  used.  Since  the  short-time  sections 
overlap  with  each  other,  solving  for  those  L  samples  in  each  short-time  section  was  shown  to  be 
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suffident  to  reconstruct  x  (n ).  However,  in  section  4.2  we  solved  the  equations  separately  for 
each  short-tune  section.  In  solving  those  equations,  we  used  the  already  determined  samples  of 
the  short-dme  section  immediately  preceeding  in  time.  Clearly,  such  a  solution  neglects  the  struc¬ 
ture  of  the  short-time  spectrum  that  is  contained  in  the  overlap  of  any  padcular  short-time  sec- 
don  with  the  short-time  section  that  follows  it  in  time.  This  structure  is  important  to  exploit  when 
there  are  errors  in  the  short-dme  spectral  magnitude.  In  fact,  the  structure  of  the  short-dme  spec- 

t 

trum  extends  over  the  entire  time  duration  of  the  signal  because  of  the  overlap  between  all  the 
short-dme  sections.  Therefore,  in  the  simultaneous  extrapolation  approach,  we  simultaneously 
solve  several  sets  of  L  equations  corresponding  to  a  set  of  overlapping  short-time  sections.  In  the 
extreme,  one  may  solve  for  all  the  sets  of  equations  for  the  entire  signal  simultaneously.  How¬ 
ever,  this  would  generally  be  computationally  prohibitive. 

The  simultaneous  extrapolation  techniques  have  not  been  implemented  for  this  thesis.  How¬ 
ever,  since  they  exploit  more  of  the  structure  of  the  short-time  spectrum,  they  are  expected  to 
perform  better  than  the  sequential  extrapolation  techniques.  On  the  other  hand,  we  will  see  in 
the  following  chapters  that  the  sequential  techniques  perform  quite  reasonably  in  various  speech 
processing  applications.  The  sequential  techniques  generally  have  the  advantage  of  a  simpler 
computational  structure. 
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CHAPTER  FIVE:  SIGNAL  ESTIMATION  FROM 
MODIFIED  SHORT-TIME  SPECTRAL  MAGNITUDE 

In  many  applications  it  is  desirable  to  modify  the  short-time  spectral  magnitude  of  a  signal. 
For  example,  to  smooth  a  noisy  signal,  the  spectral  magnitudes  of  the  short-time  sections  may  be 
filtered  independently  according  to  their  frequency  characteristics.  As  discussed  in  section  5.1, 
the  structure  of  the  short-time  spectrum  is  very  sensitive  to  such  modifications  [3,5];  the  modified 
function  is  generally  not  a  valid  short-dme  spectrum.  It  is  of  interest  to  estimate  a  signal  in  some 
reasonable  way  from  the  modified  short-time  spectral  magnitude.  For  example,  we  would  like 
to  obtain  a  smoothed  signal  estimate  from  the  filtered  short-dme  spectral  magnitude  of  a  noisy 
signal. 

In  section  5.2,  we  consider  the  issues  involved  in  applying  the  signal  reconstruction  algo¬ 
rithms  of  the  previous  chapter  for  signal  estimation  from  modified  short-time  spectral  magnitude. 
In  section  5.3,  we  discuss  certain  artifacts  associated  with  signal  estimates  from  modified  short- 
time  spectral  magnitude.  Specifically,  such  estimates  may  contain  abrupt  changes  at  certain  loca¬ 
tions  corresponding  to  the  boundaries  of  short-time  sections.  The  discussion  includes  possible 
ways  of  suppressing  these  artifacts.  In  fact,  for  speech  processing  it  is  found  that  the  sequential 
iterative  algorithm  of  the  previous  chapter  is  quite  successful  in  suppressing  the  artifacts.  Further¬ 
more,  even  better  performance  is  to  be  expected  from  simultaneous  extrapolation  algorithms. 

5. 1  Short-time  Spectral  Structure 

An  arbitrary  function  of  time  and  frequency  does  not  necessarily  represent  the  short-time 
spectral  magnitude  of  a  signal  [3,5].  This  is  because  the  definition  of  the  short-time  spectrum 
imposes  a  structure  on  its  time  and  frequency  variations.  To  see  this  structure,  let  us  examine  the 
definition  of  the  short-time  spectrum 

Xw  ( nL ,«)  =2  x  (m  )w  (nL  —m  )  e  ~J'um  (5, 1) 
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This  expression  for  Xw  ( nL  ,w)  can  be  viewed  for  a  fixed  u  as  the  convolution  in  a  of  x(n)e 
with  w  (n ).  On  the  other  hand,  for  a  fixed  a,  we  can  view  Xw(nL  ,ui)  as  a  convolution  in  fre¬ 
quency  through  the  following  equivalent  definition  [S] 

% 

Xw(nL,u)  =  fX(1i)W(ib-<o)eJ<-'l'-a!>*Ld>li 

where  X (w)  and  W  (<u)  are  the  Fourier  transforms  of  x  (a )  and  w  (n )  respectively. 

Another  illustration  of  the  structure  in  the  short-time  spectrum  is  obtained  from  the 
interpretation  of  Xw (nL ,<o)  as  a  collection  of  Fourier  transforms  obtained  as  window  w(-a) 
slides  across  x  (n  ).  In  particular,  consider  the  case  when  the  analysis  window  w(a)  is  unity  over 
Qsn<N  and  zero  otherwise.  Then  Xw  ( n'L  ,w)  for  a  particular  n  =n‘  is  the  Fourier  transform  of 
the  portion  of  x  (a  )  over  n'L  -N  <n  Sn’L  .  Similarly,  Xw  ((«'  -*-l)L  ,u>)  is  the  Fourier  transform  of 
the  portion  of  x(n)  over  («'  +1)Z.  —N<n  £(«'  +1)L  .  Then,  the  inverse  Fourier  transforms  of 
Xw  ( n'L  ,10)  and  X ((«'  +1)1  ,w)  are  the  same  over  («'  +  l)t  -N  <n  sa'L  .  Clearly,  any  two  arbi¬ 
trary  Fourier  transforms  are  unlikely  to  have  such  a  property.  Similarly,  with  two  arbitrary 
Fourier  transform  magnitudes,  it  is  unlikely  that  any  of  the  various  sequences  corresponding  to 
one  Fourier  transform  magnitude  overlaps  in  the  desired  way  with  any  of  the  sequences 
corresponding  to  the  other  Fourier  transform  magnitude. 

5.2  Signal  Estimation  Algorithms 

As  in  previous  chapters,  we  define  the  short-time  spectral  magnitude  of  a  sequence  x (a)  by 
Sw(nL  ,<d)  =  |  2  jc(«)w(aL -«)|2 

M  •-* 

hi  chapters  3  and  4,  we  found  sufficient  conditions  and  corresponding  algorithms  for  reconstruct¬ 
ing  the  signal  x(a)  from  Sm  (nL  ,<d).  hi  this  section,  we  consider  using  those  reconstruction  algo¬ 
rithms  for  obtaining  signal  estimates  from  modified  versions  of  Sw  (nL  ,u>).  We  will  denote  any 
modified  versions  of  Sw  (nL  ,w)  by  Mw  (nL  ,<u). 
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From  section  5.1,  we  know  that  Mw  ( nL  ,u>)  is  generally  not  a  valid  short-time  spectral  mag¬ 
nitude.  Consequently,  any  algorithm  that  relies  critically  on  the  validity  of  the  short-time  spectral 
magnitude  performs  poorly.  This  is  the  case,  for  example,  in  the  sequential  extrapolation  algo¬ 
rithms  that  use  the  extrapolation  techniques  in  the  proofs  of  theorems  2.1  and  2.3  of  chapter 
two.  In  those  algorithms,  only  a  part  of  the  autocorrelation  of  each  short-time  section  is  used  for 
extrapolation  of  the  unknown  samples.  This  ensures  that  the  extrapolated  samples  of  each 
short-time  section  are  consistent  with  just  a  portion  of  that  section’s  autocorrelation.  When  the 
spectral  magnitude  is  unmodified,  the  remaining  portion  of  the  section’s  autocorrelation  is  also 
consistent  with  the  extrapolation.  However,  if  the  spectral  magnitude  of  the  short-time  section  is 
modified,  there  is  no  guarantee  that  the  extrapolated  samples  will  be  consistent  with  the  unused 
portion  of  the  autocorrelation.  It  is  therefore  desirable  in  such  cases  to  use  algorithms  that  extra¬ 
polate  each  short-time  section  in  a  way  that  ensures  as  much  consistency  as  possible  with  the 
given  autocorrelation.  The  least-squares  and  iterative  extrapolation  algorithms  of  the  previous 
chapter  were  designed  for  this  purpose.  Therefore,  we  will  use  the  same  techniques  for  signal 
estimation  from  modified  short-time  spectral  magnitude. 

Another  difficulty  encountered  in  applying  the  techniques  of  chapter  4  for  signal  estimation 
from  Mw  (nL  ,u>)  is  that  those  techniques  require  a-priori  knowledge  of  some  initial  samples  of 
the  signal  estimate.  Our  approach  is  to  use  a  reasonable  guess  for  those  initial  samples.  For 
example,  the  initial  samples  of  the  unprocessed  signal,  if  known,  may  be  used.  The  techniques  of 
chapter  4  were  designed  in  order  to  be  not  too  sensitive  to  modifications  in  the  known  informa¬ 
tion  such  as  the  initial  samples.  We  find  this  to  be  the  case  in  speech  applications  such  as  those 
discussed  in  chapters  6  and  7.  The  effect  of  errors  in  the  initial  signal  samples  as  well  as  errors  in 
the  short-time  structure  of  M w  (nL  ,<o)  is  discussed  in  the  next  section. 
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5.3  Short-Time  Boundary  Artifacts 

As  discussed  in  section  5.1,  processing  the  short-time  spectral  magnitude  results  in  a  func¬ 
tion  that  in  general  does  not  correspond  to  the  short-time  spectral  magnitude  of  any  signal. 
Furthermore,  the  algorithms  of  chapter  4  for  signal  reconstruction  from  the  magnitude  of  the 
short-time  spectrum  require  a-priori  knowledge  of  a  certain  number  of  initial  signal  samples. 
However,  in  most  applications,  such  information  is  impossible  to  obtain  accurately.  As  a  result 
of  such  inaccuracies,  certain  artifacts  arise  at  the  boundaries  of  short-time  sections  in  signal  esti¬ 
mates  from  the  modified  short-time  spectral  magnitude.  In  particular,  we  find  abrupt  changes  in 
signal  value  at  certain  locations  corresponding  to  the  boundaries  of  short-time  sections.  In  this 
section,  we  will  study  the  origin  of  these  artifacts  and  discuss  ways  of  avoiding  them.  In  fact,  in 
chapters  6  and  7  we  will  see  that  the  sequential  iterative  algorithm  of  chapter  4  performs  quite 
well  in  this  regard  for  speech  processing  applications. 

To  investigate  the  cause  for  such  artifacts,  let  us  consider  a  discrete-time  signal  jc(«)  with 
short-time  spectrum  Xw {nL  ,<u).  We  now  replace  the  short-time  spectral  phase  of  Xw(nL, w)  by 
some  other  arbitrarily  selected  phase  function.  In  particular,  consider  any  two  overlapping 
short-time  sections  of  jt(a).  When  the  spectral  phases  of  these  sections  are  replaced  by  some 
other  phase  functions,  the  time  distribution  of  the  two  short-time  sections  changes.  This  distribu¬ 
tion  may  range  anywhere  between  minimum  phase  energy  (  concentrated  near  smaller  values  of 
n  in  the  short-time  section  )  to  maximum  phase  energy  (  concentrated  near  larger  values  of  n  in 
the  short-time  section  ).  Consequently,  if  the  new  phase  functions  were  selected  arbitrarily,  there 
is  no  guarantee  that  at  the  boundaries  of  the  short-time  sections  their  time  distributions  will 
match.  For  example,  in  Figure  5.1  we  show  the  test  waveform  of  Figure  4.3  with  its  short-time 
spectral  phase  replaced  by  aero.  The  analysis  window  is  a  128-point  rectangular  window. 
Clearly,  there  are  very  abrupt  transitions  within  this  signal  that  were  not  present  in  the  original 
signal  of  Figure  4.3.  Furthermore,  the  abrupt  changes  occur  periodically  and  they  actually 
correspond  to  boundaries  of  short-time  sections. 
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Fig.  5.1  Effect  of  Zero  Short-Time  Spectral  Phase 
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We  have  thus  established  that  given  any  short-time  spectral  magnitude,  if  a  phase  function 
is  selected  for  it  arbitrarily,  this  will  give  rise  to  short-time  boundary  artifacts.  It  is  therefore 
important  that  any  algorithm  for  signal  estimation  from  short-time  spectral  magnitude  should 
attempt  to  select  the  phase  function  in  a  way  that  minimizes  the  short-time  boundary  artifacts. 
Clearly,  if  there  is  no  error  in  the  input  to  a  reconstruction  algorithm  and  if  there  is  no  computa¬ 
tional  error,  the  algorithm  will  select  the  unique  phase  function  corresponding  to  the  short-time 
spectral  magnitude.  There  will  therefore  be  no  short-time  boundary  artifacts. 

We  have  seen  that  in  various  short-time  spectral  processing  applications,  the  processed 
short-dme  spectral  magnitude  does  not  correspond  to  the  short-dme  spectral  of  any 

signal.  Furthermore,  the  initial  samples  of  the  processed  signal  are  usually  impossible  to  deter¬ 
mine  exactly.  Consequently,  the  time  distribudon  of  the  short-dme  section  is  generally  incorrect. 
It  is  important  that  any  algorithm  for  signal  estimation  should  choose  the  remaining  short-time 
sections  in  a  way  that  minimizes  the  short-time  boundary  artifacts. 

In  this  thesis  we  have  found  that  the  sequential  iterative  algorithm  significantly  suppresses 
the  short-time  boundary  artifacts  for  speech  applications.  However,  the  algorithm  is  limited  by 
its  sequential  nature.  Specifically, the  alignment  of  short-time  sections  is  accomplished  by  consid¬ 
ering  pairs  of  short-time  sections  independently  and  in  an  order  determined  by  their  location  on 
the  time  axis.  Thus,  given  the  distribution  of  the  one  short-time  section,  the  distribution  of  the 
short-time  section  immediately  following  it  is  determined.  In  aligning  the  two  sections,  no  infor¬ 
mation  on  the  other  short-time  sections  is  incorporated.  Thus,  the  minimization  of  short-time 
boundary  artifacts  is  accomplished  only  over  localized  regions  of  the  short-time  spectral  magni¬ 
tude.  It  is  expected  that  the  performance  of  sequential  algorithms  can  be  improved  upon  by  using 
simultaneous  extrapolation  algorithms. 

Although  sequential  extrapolation  algorithms  can  be  designed  to  significantly  suppress  the 
short-time  boundary  artifacts,  they  often  do  not  yield  the  type  of  time  distribution  in  the  short- 
time  sections  that  is  consistent  with  that  of  the  unprocessed  signal.  For  example,  when  noise 


reduction  processing  is  applied  to  speech  in  chapter  7,  we  find  that  the  detailed  shapes  within 
short-time  sections  of  the  processed  signal  are  significantly  different  from  those  of  the  original 
undegraded  signal.  However,  the  sections  do  preserve  such  important  attributes  as  the  periodicity 
of  the  voiced  sections.  In  fact,  perceptually  we  find  that  the  processed  speech  is  almost  identical 
to  the  original  undegraded  speech.  It  is  concluded  that  although  the  actual  short-time  spectral 
phase  is  not  important  for  speech  perception,  it  is  essential  that  the  phase  be  chosen  so  as  to 
avoid  short-time  boundary  artifacts.  Finally,  it  must  be  observed  that  the  significant  change  in 
detailed  short-time  signal  shapes  seems  largely  to  be  a  consequence  of  the  sequential  character  of 
the  algorithms  we  have  implemented  for  this  thesis.  For  applications  where  such  change  in 
detailed  shape  is  not  acceptable,  it  is  suggested  that  simultaneous  extrapolation  algorithms  be 
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CHAPTER  SIX:  TIME-SCALE  MODIFICATION 


6.1  Introduction 

Signal  estimation  from  short-time  spectral  magnitude  is  applied  in  this  chapter  to  the  prob¬ 
lem  of  time-scale  modification  of  speech.  Time-scale  modification  procedures  aim  at  maintaining 
the  perceptual  quality  of  the  original  speech  while  changing  the  apparent  rate  of  articulation. 
This  is  essentially  equivalent  to  preserving  the  instantaneous  frequency  locations  while  changing 
their  rate  of  change  in  time.  Efficient  procedures  for  such  processing  have  a  number  of  impor¬ 
tant  applications.  Controls  for  time-scale  modification  on  a  tape  recorder,  for  example,  would 
allow  users  to  pace  the  playback  according  to  their  own  convenience.  Thus,  sections  of  the 
recording  can  be  scanned  over  rapidly  or  played  slowly  depending  on  the  listeners  needs.  This 
gives  the  recorded  medium  additional  flexibility  that  previously  only  printed  text  could  provide. 
For  the  blind,  this  is  a  particularly  encouraging  prospect,  since  even  normal  recorded  speech 
offers  a  "reading  rate"  that  is  typically  2  to  3  times  that  for  Braille  [12]. 

Efficient  time-scale  modification  of  speech  is  also  applicable  in  the  areas  of  signal 
coding/decoding  and  speech  recognition  systems.  In  the  former  case,  speech  may  be  time- 
compressed  at  the  coding  stage  to  reduce  the  data  rate  and  then  appropriately  time-expanded  at 
the  decoding  stage.  In  speech  recognition  systems,  time-scale  modification  could  be  used  to  nor¬ 
malize  the  duration  of  utterances  before  applying  recognition  algorithms. 

A  simple  time-scaling  that  replaces  x  (n  )  by  x  (an )  introduces  significant  degradation  for  the 
above  applications.  For  example,  such  a  scaling  is  obtained  when  a  recording  is  played  back  fas¬ 
ter  than  the  original  recording  rate.  The  resulting  "  Mickey  Mouse  "  effect  is  amusing  but  distorts 
most  of  the  original  perceptual  characteristics  of  the  speech.  This  degradation  is  caused  by 
changes  in  the  pitch  of  voiced  sections  and  by  the  shifting  of  vocal  tract  resonances  (formants). 
Thus,  more  sophisticated  time-scale  modification  techniques  are  required  to  keep  the  pitch  and 
formant  locations  as  invariant  as  possible. 
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Many  of  the  techniques  devised  in  the  past  for  time-scale  modification  of  speech  are  based 
on  an  approach  first  used  in  a  technique  known  as  Fairbanks  method  [13],  This  approach  and 
the  various  techniques  based  on  it  are  described  in  section  6.2.  An  alternadve  approach,  based 
on  the  Phase  Vocoder  [14],  was  used  by  Portnoff  [7]  to  develop  a  very  successful  time-scale 
modification  technique.  This  approach,  outlined  in  section  6.3,  processes  both  the  magnitude 
and  the  phase  of  the  short-time  spectrum.  The  resulting  time-scale  modification  is  generally  con¬ 
sidered  to  be  of  acceptable  quality  [7,15]  for  many  applications.  However,  from  a  practical 
point  of  view  this  technique  has  the  major  'Msadvantage  of  a  complicated  computational  struc¬ 
ture. 

The  time-scale  modification  procedure  developed  in  section  6.4  combines  the  techniques  for 
signal  estimation  from  short-time  spectral  magnitude  with  the  basic  idea  behind  Fairbaak’s 
method.  The  resulting  time-scale  modifications  are  found  to  be  comparable  to  those  achieved 
with  the  Phase  Vocoder  technique  developed  by  Portnoff.  However,  the  technique  proposed  in 
section  6.4  has  a  much  simpler  computational  structure  and  can  be  used  to  design  practical  time- 
scale  modification  systems. 


6.2  Fairbanks  Approach 

Fairbanks’  approach  [13]  to  time-scale  modification  mainly  consists  of  discarding  or  repli¬ 
cating  short-time  sections  of  the  speech  depending  upon  whether  time  compression  or  time 
expansion  is  desired.  Provided  the  short-time  sections  are  short  enough,  portions  of  all  the 
phonemes  [2]  are  preserved  but  their  durations  are  changed.  Furthermore,  the  pitch  and  for¬ 
mants  in  the  voiced  sections  are  retained.  However,  a  major  difficulty  is  that  the  transitions 
between  the  short-time  sections  is  not  smooth.  These  sharp  transitions  introduce  a  periodic 
degradation  at  the  frame  rate,  perceptually  perceived  as  a  'burbling"  distortion  [12]. 

Various  strategies  based  on  pitch  detection  have  been  designed  to  overcome  the  smooth 
transition  problem  in  Fairbanks  approach.  These  include  the  pitch-synchronous  technique 
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[26,18]  and  the  pseudo-pitch-synchronous  technique  [17].  The  pitch-synchronous  technique 
chooses  the  short-dme  sections  so  that  they  correspond  to  multiples  of  the  pitch  period  in  voiced 
sections.  This  ensures  a  smoother  transition  between  adjacent  short-dme  sections.  In  order  to 
select  such  short-time  sections  it  is  necessary  to  first  apply  pitch  marking  algorithms.  Any  errors 
in  the  pitch  marking  introduces  objectionable  artifacts  in  the  speech  [28].  This  is  particularly  a 
problem  with  noisy  speech,  since  in  that  case  pitch  marking  algorithms  have  very  poor  perfor¬ 
mance.  The  pseudo-pitch-synchronous  technique  attempts  to  avoid  this  problem  by  requiring 
only  a  rough  estimate  of  the  pitch  periods.  The  algorithm  repeats  or  discards  sections  of  the 
speech  equal  in  length  to  the  average  pitch  period,  then  smooths  together  the  edges  of  the 
remaining  sections.  This  algorithm  has  better  performance  than  the  pitch-synchronous  method, 
particularly  in  the  presence  of  noise. 

The  desire  to  obtain  a  time-scale  modification  technique  that  is  not  dependent  on  pitch 
extraction  and  voiced/unvoiced  decisions  prompted  the  work  on  Phase  Vocoder  based  techniques 
[7,12].  This  led  to  the  development  of  a  very  successful  technique  described  in  the  next  section. 

6.3  Phase  Vocoder  Approach 

An  alternative  to  the  Fairbanks  time-scale  modification  approach  is  to  use  classical  vocoder 
techniques.  The  speech  is  coded  in  the  vocoder  analysis  stage  with  time-dependent  parameters. 
The  idea  is  to  appropriately  time-scale  those  parameters  before  the  resynthesis  of  the  speech  sig¬ 
nal.  However,  most  of  the  classical  vocoder  techniques  require  voiced/unvoced  decisions  and 
pitch  extraction.  Thus,  any  time-scale  modification  technique  based  on  such  vocoders  would 
suffer  the  same  kind  of  pitch  detection  artifacts  as  those  found  in  pitch-synchronous  refinements 
of  the  Fairbanks  approach.  One  exception  to  this  is  the  Phase  Vocoder  [14,2,19,20].  This 
vocoder  uses  the  short-time  spectrum  (both  magnitude  and  phase)  for  representing  the  speech 
signal.  Furthermore,  it  does  not  require  voiced/unvoiced  decisions  or  any  pitch  extraction  pro¬ 


cedures. 
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Portnoff  [7,12]  ha:  developed  a  very  successful  time-scale  modification  technique  based  on 
the  Phase  Vocoder.  Toward  this,  he  first  developed  a  mathematical  representation  for  the  sam¬ 
pled  speech  signal  based  on  the  usual  model  for  speech  production.  This  representation  is  used 
as  the  basis  for  a  definition  of  rate-changed  speech.  Finally,  Portnoff  showed  how  the  short-time 
spectral  representation  used  in  the  Phase  Vocoder  provides  a  mechanism  for  modifying  the 
speech  time-scale.  In  the  remainder  of  this  section  we  briefly  outline  Portnoffs  procedure  for 
obtaining  time-scale  modified  speech  from  the  short-time  spectrum. 

Let  x(n)  be  the  discrete  time  signal  which  is  to  be  time-scale  modified  by  a  factor  of  p.  We 
restrict  fL to  "be  a  rational  number.  This  is  not  a  practical  restriction  since  ainy  real  number  can  be 
approximated  by  a  rational  number  with  arbitrary  precision.  In  the  phase  vocoder  approach 
x  (n )  is  first  transformed  into  its  short-time  spectrum  for  M  frequency  locations  chosen  appropri¬ 
ately  to  avoid  aliasing  [7,12].  In  the  expression  below  for  the  short-time  spectrum  we  assume  that 
u>  is  evaluated  at  just  those  M  frequency  values. 

2fw(/«P,<o)  =  2  x(m) 

m  *-* 

If  p  is  not  an  integer,  this  computation  is  accomplished  through  an  interpolating  procedure  [2]. 
The  next  step  is  to  estimate  the  unwrapped  phase  [1]  of  Xlv(^ji,w).  A  good  description  of  the 
phase  estimation  process  is  given  in  [15].  For  time-scale  modification,  we  want  the  pitch  fre¬ 
quency  locations  to  remain  the  same  but  their  time  variation  to  change  by  the  factor  p.  Portnoff 
showed  that  this  can  be  accomplished  by  dividing  the  unwrapped  phase  of  Xw  (n$,m)  by  p.  After 
this  division,  the  phase  vocoder  approach  synthesizes  the  time-scale  modified  speech  from  the 
processed  short-time  spectrum. 

A  major  problem  with  the  phase  vocoder  approach  is  its  computational  complexity.  It  gen¬ 
erally  requires  sophisticated  indexing  and  rather  large  memory  space  for  its  implementation.  For 
example,  Portnoff  fc;  i  to  introduce  significant  memory  management  to  implement  the  tcranique 
on  a  PDP  11/50.  On  the  other  hand,  Holtzman  [15]  aas  developed  an  alternative  implementation 
that  significantly  reduces  the  memory  requirements  but  at  the  expense  of  greater  programming 
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complexity.  The  technique  presented  in  the  next  section  is  based  on  iterative  signal  estimation 
from  the  short-time  spectral  magnitude.  Compared  to  the  phase  vocoder  approach,  that  tech¬ 
nique  has  considerable  computational  advantages.  Furthermore,  it  appears  to  have  comparable 
performance  in  terms  of  the  quality  of  the  time-scale  modified  speech. 


6.4  Short-Time  Spectral  Magnitude  Approach 

This  section  describes  a  technique  for  time-scale  modification  of  speech  using  signal  estima¬ 
tion  from  modified  short-time  spectral  magnitude.  The  performance  of  this  technique  appears  to 
be  comparable  to  the  quality  achieved  by  Portnoffs  technique.  On  the  other  hand,  as  noted 
before,  this  scheme  is  computationally  much  simpler  and  requires  very  little  memory. 

The  basic  idea  for  the  technique  in  this  section  is  similar  to  the  Fairbanks  approach  where 
various  short-time  sections  are  discarded  or  replicated  according  to  whether  compression  or 
expansion  is  desired.  However,  the  difference  is  that  in  this  case  the  spectral  magnitudes  of  vari¬ 
ous  short-time  sections  are  discarded  or  repeated  in  the  short-time  spectral  magnitude  of  the 
speech.  In  the  Fair  bank  approach,  the  remaining  short-time  sections  are  merely  concatenated 
with  each  other,  possibly  taking  into  account  any  pitch  information  that  may  be  available.  In 
contrast,  the  strategy  here  is  to  consider  the  set  of  spectral  magnitudes  of  the  remac  Ing  short- 
time  sections  as  representing  a  modified  short-time  spectral  magnitude.  The  techniques  of  signal 
estimation  from  modified  short-time  spectral  magnitudes  are  then  used  to  obtain  the  time-scale 
modified  signal.  As  discussed  in  chapter  S,  such  signal  estimation  techniques  can  be  designed 
such  that  they  significantly  suppress  the  short-time  boundary  artifacts  in  the  signal  estimate.  This 
results  in  the  kind  of  alignment  between  short-time  sections  that  is  attempted  by  the  pitch- 
synchronization  implementations  of  the  Fairbanks  approach.  However,  in  contrast  to  those 
techniques,  this  approach  does  not  depend  on  pitch  detection  or  pitch  marking  algorithms.  It 
thus  tends  to  be  much  more  robust  to  noise  in  the  speech. 
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For  the  signal  estimation  component  of  our  approach  to  time-scale  modification,  we  have 
found  the  sequential  iterative  technique  of  chapter  4  to  be  particularly  attractive  for  speech.  It 
has  the  advantage  of  a  simple  computational  structure  along  with  a  high  quality  performance  in 
the  tests  we  have  conducted.  As  an  example,  consider  time  compression  of  the  test  sentence  in 
this  thesis:  The  bowl  dropped  from  his  hand".  The  entire  waveform  of  the  sentence  is  shown  in 
Figure  4.3.  The  short-dme  spectral  magnitude  of  the  waveform  is  computed  with  a  128-point 
Hamming  window  and  a  window  spacing  L  of  32.  Every  other  spectral  magnitude  is  discarded  in 
order  to  obtain  a  2:1  time  compression.  The  iterative  algorithm  of  chapter  4  then  yields  the 
waveform  shown  in  Figure  6.1.  Clearly,  the  duration  of  the  sentence  has  been  cut  by  half.  Furth¬ 
ermore,  the  pitch  of  the  various  segments  is  the  same  as  in  Figure  4.3  and  there  are  very  few 
short-time  boundary  artifacts. 

As  another  example  consider  time  expansion  of  the  test  sentence.  In  this  case  the  short-time 
spectral  magnitude  is  computed  with  a  128  point  rectangular  window  and  window  spacing  L  of 
32.  To  obtain  the  time-scale  modified  signal  estimate,  this  short-time  spectral  magnitude  is  con¬ 
sidered  to  correspond  to  a  window  spacing  L  of  64.  Clearly,  this  results  in  a  signal  that  is  twice 
as  long  as  the  original  signal.  The  result  obtained  using  the  iterative  technique  is  shown  in  Figure 
6.2.  Once  again,  the  pitch  of  the  various  segments  is  preserved  and  there  axe  very  few  short-time 
boundary  artifacts.  The  quality  of  the  resulting  speech  is  comparable  to  that  obtained  with 
PortnofPs  technique. 

Finally,  note  that  different  rates  of  expansion  and  compression  can  be  obtained  than  those 
used  in  the  examples  above.  Far  example,  discarding  two  out  of  every  three  short-time  segments 
results  in  time  compression  by  a  factor  of  3.  On  the  other  hand,  if  one  out  of  every  three  seg¬ 
ments  is  discarded,  the  processed  signal  is  two  thirds  as  long  as  the  original  speech.  Similarly,  a 
time  expansion  by  a  factor  of  three  can  be  obtained  by:computing  the  short-time  spectral  magni¬ 
tude  at  a  time-sampling  rate  three  times  higher  than  the  maximum  rate  -  half  the  analysis  win¬ 
dow  length.  The  result  is  then  processed  as  if  it  were  sampled  at  the  maximum  rate.  Clearly, 
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CHAPTER  SEVEN:  NOISE  REDUCTION 

The  problem  of  noise  reduction  arises  in  numerous  signal  processing  contexts.  The  reduc¬ 
tion  of  noise  in  a  signal  may  either  be  the  final  goal  or  an  intermediate  step.  For  example, 
speech  communication  between  a  pilot  and  an  air  traffic  tower  is  typically  degraded  by  back¬ 
ground  noise.  In  such  a  case,  the  reduction  of  noise  is  the  final  signal  processing  step  for  ensuring 
dear  communication.  On  the  other  hand,  a  radar  image  may  be  processed  for  noise  reduction  as 
only  a  preliminary  step  for  target  detection.  In  all  sucb  cases,  the  noise  reduction  is  an  essential 
element  of  the  entire  system. 

In  this  chapter,  we  consider  the  processing  of  signal-independent  additive  noise.  Many 
problems  in  speech  and  image  processing  fall  into  this  category.  Furthermore,  problems  involv¬ 
ing  multiplicative  or  convolutional  noise  can  be  converted  into  an  additive  noise  problem  by  a 
homomorphic  transformation  (1,21].  Sometimes,  even  signal  dependent  noise  may  be  converted 
to  signal  independent  additive  noise.  For  example,  pseudo-noise  techniques  [2S]  have  been  used 
for  such  a  transformation  in  the  quantization  noise  associated  with  PCM  signal  coding. 

In  the  problem  of  noise  filtering  of  speech  or  image  data,  it  is  often  preferable  to  use 
short-time  spectral  processing  [6,22].  This  is  primarily  because  long-time  filtering  tends  to 
smooth  out  local  variations  that  are  often  important  attributes  of  the  signal.  In  contrast,  short- 
time  spectral  processing  attempts  to  preserve  such  attributes  and  is  therefore  generally  considered 
a  better  alternative.  Section  7.1  discusses  in  greater  detail  the  advantages  of  short-dme  spectral 
processing  for  noise  reduction.  A  number  of  short-time  spectral  processing  techniques  exist  for 
noise  reduction  in  speech  and  images.  The  spectral  subtraction  technique  has  been  shown  [6]  to 
have  good  performance  and  relatively  simple  implementation.  We  describe  the  standard  version 
of  this  technique  in  section  7.2.  One  characteristic  of  the  standard  spectral  subtraction  technique 
and  other  short-time  spectral  processing  techniques  for  noise  reduction  is  that  the  short-time 
spectral  phase  of  the  noisy  signal  is  left  unprocessed. 
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In  section  7.3,  we  present  a  modification  for  the  short-time  spectral  subtraction  technique, 
in  which  signal  estimation  from  the  processed  short-time  spectral  magnitude  is  used  to  obtain  a 
processed  version  of  the  short-time  special  phase.  This  modification  is  also  applicable  to  the 
other  short-time  spectral  processing  techniques  mentioned  in  section  7.1.  We  find  that  the  perfor¬ 
mance  of  the  modified  short-time  spectral  subtraction  technique  is  comparable  to  that  of  stan¬ 
dard  short-time  spectral  subtraction.  However,  unlike  the  standard  technique,  the  modified  tech¬ 
nique  does  not  require  the  short-time  spectral  phase  of  the  noisy  signal. 

Both  the  standard  spectral  subtraction  technique  and  its  modified  version  introduced  in  this 
thesis  produce  certain  undesirable  artifacts  in  the  processed  signal.  In  section  7.4,  possible  causes 
for  those  artifacts  are  discussed  and  techniques  are  developed  for  suppressing  them.  These 
artifact  suppression  techniques  can  be  applied  to  both  the  standard  and  modified  versons  of 
short-time  spectral  subtraction. 

7.1  Shart-Tlme  Spectral  Processing  Techniques 

To  establish  a  mathematical  framework  for  our  discussion,  let  s  {n )  denote  the  discrete-time 
agnai  we  want  to  estimate  from  another  signal  x(n)  which  is  the  sum  of  s(n)  and  a  noise  signal 
*  (n).  It  is  assumed  that  e(/i)  is  a  sample  sequence  of  a  stationary  stochastic  process  with  known 
q>ectrum  ? t  («).  Although  for  convenience  we  use  the  notation  of  one  dimensional  signals,  the 
entire  disease  on  in  this  section  is  also  applicable  to  multidimensional  signals  [24], 

A  number  of  classical  techniques  exist  for  filtering  stationary  stochastic  processes  [23].  In 
particular,  a  problem  that  has  been  widely  considered  is  that  of  filtering  additive  stationary  noise 
from  stationary  stochastic  processes.  This  has  resulted  in  the  well-known  □  on-causal  Wiener  filter 
and  numerous  other  related  techniques.  It  is  therefore  not  surprising  that  such  techniques  have 
been  considered  for  the  noise  reduction  problem  in  numerous  application  areas.  The  most  suc¬ 
cessful  applications  are  those  where  the  desired  signal  s  (n )  can  be  adequately  modelled  as  a  sam¬ 
ple  sequence  of  a  stationary  stochastic  process  with  a  known  spectrum.  However,  in  areas  such  as 
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speech  and  image  processing  such  a  model  is  generally  inadequate.  For  example,  speech  is  com¬ 
monly  modelled  as  the  output  of  a  time-varying  linear  system  driven  by  either  white  noise  or 
quasi-periodic  pulses  [2].  Therefore,  it  is  inappropriate  to  consider  the  output  of  such  a  system  as 
a  stationary  process. 

The  linear  system  mentioned  above  for  the  modelling  of  speech  signals  is  slowly  time- 
varying.  This  has  lead  investigators  to  consider  short-time  sections  of  speech  signals  to  have  sta¬ 
tionary  spectral  characteristics.  This  approximation  has  been  used  successfully  in  a  variety  of 
engineering  contexts,  including  noise  reduction,  speech  synthesis,  and  bandwidth  compression. 
Unfortunately,  there  is  no  comparable  model  for  images.  However,  the  inspection  of  any  typical 
image  shows  many  rapidly  space-varying  characteristics.  In  fact,  much  of  the  information  in 
images  lies  in  sharp  changes  such  as  those  at  the  boundaries  of  objects.  These  types  of  charac¬ 
teristics  generally  render  useless  any  attempt  at  modelling  images  as  outputs  of  stationary  stochas¬ 
tic  systems.  However,  except  near  sharp  changes  such  as  those  at  object  boundaries,  short-space 
(  2-0  equivalent  of  short-time  )  modelling  of  images  with  stationary  processes  has  been  relatively 
successful  [22,24].  Furthermore,  signal  processing  based  on  such  short-space  modelling  often 
does  not  appreciably  degrade  object  boundaries  and  other  sharp  details. 

The  short-time  spectrum  has  proved  to  be  particularly  convenient  for  the  short-time  pro¬ 
cessing  of  speech  and  images.  The  central  idea  is  to  process  the  spectrum  of  each  shon-time  sec¬ 
tion  separately.  Since  the  signals  are  assumed  to  be  stationary  at  the  short-time  level,  classical 
noise  reduction  techniques  based  on  spectral  filtering  can  be  used.  A  common  characteristic  of  all 
these  techniques  is  that  they  yield  zero-phase  filters.  Thus,  the  overall  processing  affects  only  the 
short-time  spectral  magnitude. 

7.2  Standard  Short-Time  Spectral  Subtraction 

A  number  of  short-time  spectral  processing  techniques  have  been  developed  over  the  years 
for  the  reduction  of  additive  noise  in  speech  and  image  signals  [6,24].  The  performance  of  such 


techniques  is  generally  of  the  same  order  as  that  of  a  technique  known  as  short- tone  (  or  short- 
space  for  images  )  spectral  subtraction.  However,  short-time  spectral  subtraction  offers  the 
advantage  of  simpler  implementation.  In  this  section,  we  review  the  short-time  spectral  subtrac¬ 
tion  technique  as  it  is  generally  implemented.  As  indicated  in  the  previous  section,  such  an 
implementation  of  short-time  spectral  processing  retains  the  short-  time  spectral  phase  of  the  noisy 
signal.  In  section  7.3,  on  the  other  hand,  we  will  use  the  theory  and  techniques  of  this  thesis  to 
develop  a  different  implementation  of  short-time  spectral  subtraction.  That  implementation  has 
the  property  that  it  estimates  a  processed  short-dme  spectral  phase  from  the  processed  short- ome 
spectral  magnitude. 

We  first  review  die  classical  spectral  subtraction  procedure  for  processing  stationary  random 
signals  without  utilizing  short-time  techniques.  Let  s(n)  be  the  stationary  random  signal  we  wish 
to  estimate  from  another  signal  x(/i)  which  is  the  sum  of  j(#i)  and  uncorrelated  noise  #(/»). 
Assume  that  the  power  spectral  density  Pt(u)  of  «  (n )  is  known.  The  power  spectral  density 
u)  of  j  (n)  is  then  estimated  from  the  observations  of  x(a)  and  the  known  ?,(<•>).  Specifi¬ 
cally,  since  x(n)  is  the  sum  of  s  (n )  and  the  uncorrelated  e  (. n ),  it  follows  that 

?,(«)  -?,(«)  -/>»  C™) 

A  reasonable  estimate  for  Pt  (w)  is  obtained  by  subtracting  the  known  spectrum  Pt  (w)  from  an 
estimate  of  ^  (<*»).  The  estimate  of  Px  (to)  is  usually  computed  as  the  magnitude  squared  of  the 
Fourier  transform  of  the  observed  x(n).  The  subtraction  process  sometimes  gives  negative  values 
in  the  estimate  of  Pt  (<•»).  The  most  common  approach  for  such  situations  is  to  replace  the  nega¬ 
tive  values  by  zero  [6].  Finally,  the  square  root  of  the  estimate  of  Pt(<u)  is  used  as  the  Fourier 
transform  magnitude  for  the  estimate  of  the  signal  s  {n ).  This  estimate  of  the  Fourier  transform 
magnitude  is  then  combined  with  the  Fourier  transform  phase  of  the  noisy  signal  x(n)  to  yield 
the  standard  spectral  subtraction  estimate  of  the  desired  a  goal  s{n).  It  has  been  shown  [6]  that 
this  procedure  implicitly  performs  a  type  of  parametric  Wiener  filtering. 

The  implementation  of  the  standard  spectral  subtraction  technique  using  the  short-time 
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speerrum  is  relatively  straightforward.  The  basic  idea  is  to  consider  each  short  time  section  as  an 
observation  of  a  stationary  stochastic  process  and  apply  the  spectral  subtraction  procedure 
separately  to  each  section.  Thus,  for  example,  if  r(n)  is  the  signal  corrupted  by  additive  noise 
e(n)  and  Sw(/tL,ui)  is  the  short-time  spectral  magnitude  of  *(«),  the  spectral  subtraction  pro¬ 
cedure  yields  the  following  function: 

(nL  ,<u)  —  <sPg  («u)  if  Sw  ( nL  ,w)>o Pg  (<o) 

■“)  =  {o  otherwise  P ■%) 

where  the  parameter  a  serves  as  a  control  for  the  degree  of  noise  smoothing  to  be  achieved.  In 

practice,  it  has  been  found  that  values  of  a  between  2  and  3  produce  acceptable  results  [6,24]. 

The  analysis  window,  w(n),  and  the  sampling  interval,  L,  are  chosen  so  that 

2  w  (kL  —n )  =  1  for  ail  n  (7.3) 

k  «-» 

This  is  done  to  make  die  mapping  to  the  time  domain  easier.  In  the  standard  technique, 
Sw(nL ,«)  is  combined  with  the  short-time  spectral  phase  of  x(/t),  to  give  a  function  Dw  ( nL ,«). 
To  map  back  to  the  time  domain,  we  take  the  inverse  Fourier  transform  of  Dm  (nL  ,u»)  for  each 
n .  The  various  time  functions  thus  obtained  are  simply  added  to  each  other  in  the  time  domain 
to  give  the  spectral  subtraction  estimate  of  s  (n).  However,  if  (7.2)  is  not  satisfied,  some  addi¬ 
tional  processing  is  necessary  before  the  addition  of  the  final  short-time  sections  in  order  to  avoid 
short-time  boundary  artifacts  in  the  estimate  of  s(n). 

7.3  Magnitude-Only  Short-Time  Spectral  Subtraction 

In  the  previous  section,  we  introduced  the  standard  noise  reduction  technique  for  short-time 
spectral  subtraction.  In  this  section  we  introduce  a  different  short-time  implementation  of  spec¬ 
tral  subtraction  that  uses  results  on  signal  estimation  from  short-time  spectral  magnitude.  The 
principal  difference  from  the  standard  implementation  is  chat  the  short-time  spectral  phase  of  the 
noisy  ngnai  is  not  required  by  this  technique.  Instead,  a  phase  functior  ’  estimated  from  the 
processed  short-time  spectral  magnitude. 
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As  in  the  previous  section,  we  consider  the  processing  of  a  discrete  time  signal  x  (n )  which 
is  the  sum  of  a  desired  signal  s(n)  and  an  uncorrelated  stationary  noise  signal  e(n)  with  known 
power  spectral  density  Pe(w).  The  initial  processing  of  the  short-time  spectral  magnitude 
Sw ( nL  ,ut)  oix(n)  is  identical  to  that  performed  in  the  standard  technique  of  section  7.2.  Specifi¬ 
cally,  we  obtain  a  modified  short-time  spectral  magnitude  given  by 

if  Sw(nL  ,u)>aPt(u>) 

(*L  ,<■»)  *  |p  otherwise 

where  the  parameter  a  serves  as  a  control  for  the  degree  of  noise  smoothing  to  be  achieved.  The 
next  step  in  the  standard  technique  is  to  combine  Sw  (nL  ,iu)  with  the  short-time  spectral  phase  of 
the  noisy  signal  x(n).  However,  from  chapter  5  we  know  that  we  can  obtain  a  signal  estimate 
directly  from  the  modified  short-time  spectral  magnitude  Sw(nL,u).  For  the  signal  estimation 
algorithms  of  the  previous  chapter  we  require  a-priori  knowledge  of  L  consecutive  samples  of 
x  (n ),  starting  from  the  first  non-zero  sample.  Our  approach,  as  described  in  chapter  5,  is  to  use 
some  reasonable  estimate  for  those  samples.  For  example,  one  approach  is  to  use  the 
corresponding  L  samples  of  the  noisy  signal  x(n).  This  has  produced  reasonable  results  in  the 
processing  of  noisy  speech. 

In  our  experiments  with  magnitude-only  short-time  spectral  subtraction,  we  have  applied 
the  sequential  iterative  technique  of  chapter  4  for  the  signal  estimation  from  processed  short-time 
spectral  magnitude.  We  selected  this  particular  technique  because  of  its  simple  implementation 
requirements.  Furthermore,  as  indicated  in  chapter  4,  it  performs  well  compared  to  the  other 
sequential  reconstruction  techniques  that  have  been  tested  in  this  thesis. 

For  noise  reduction  in  speech  signals,  it  appears  from  our  experiments  that  the  performance 
of  magnitude- only  short-time  spectral  subtraction  is  comparable  to  that  of  standard  short-time 
spectral  subtraction.  For  relatively  high  signal  to  noise  ratios  (above  lOdB),  both  techniques  sig¬ 
nificantly  reduce  the  noise  without  any  appreciable  degradation  in  speech  qudity.  Figure  7.1 
shows  the  waveform  of  the  sentence  '  The  bowl  dropped  from  his  hand"  (See  Figure  4.3  for  ori¬ 
ginal  waveform)  in  additive  white  noise,  giving  a  signal  to  noise  ratio  of  15  dB.  Figure  7.2  shows 


Fig.  7.1  Test  Sentence  in  15 dB  Additive  White  Noise 


Fig.  7.2b  Magnitude-Only  Short-Time  Spectral  Subtraction 
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the  results  of  processing  that  waveform  with  standard  as  well  as  magnitude-only  spectral  subtrac¬ 
tion.  In  both  cases,  a  128-point  triangular  analysis  window  was  used.  Clearly,  both  the  processed 
waveforms  of  Figure  7.2  have  significantly  reduced  noise  levels. 

For  signal  to  noise  ratios  below  10  dB,  both  versions  of  spectral  subtraction  introduce  sig¬ 
nificant  processing  artifacts  in  die  signal.  In  the  next  section,  we  describe  these  artifacts,  discuss 
th«rr  causes,  and  present  some  techniques  for  suppressing  them. 


7.4  Artifacts  in  Short-Time  Spectral  Subtraction 

When  short-time  spectral  subtraction  is  applied  to  signals  with  low  signal  to  noise  ratios 
such  as  below  10  dB,  certain  processing  artifacts  are  generally  observed.  In  Figure  7.4,  we  illus¬ 
trate  these  artifacts  in  an  image  that  has  been  processed  with  standard  short-dme  spectral  sub¬ 
traction.  This  image  was  obtained  by  adding  6  dB  of  white  noise  to  the  image  of  Figure  7.3  and 
then  processing  it  with  the  two  dimensional  version  of  standard  short-dme  spectral  subtraction. 
Evident  in  Figure  7.4  are  two  types  of  distortion.  One  is  the  presence  of  an  apparently  harmonic 
pattern,  particularly  in  the  large  high  brightness  region  of  the  picture.  Also  noticeable  are  ’rip¬ 
ple’  blurring  effects  near  high  contrast  sharp  edges  such  as  between  the  dock  and  the  back¬ 
ground.  Although  generally  not  as  severe  as  the  distortion  represented  by  the  harmonic  pattern, 
this  is  also  a  quality  limiting  artifact.  Similar  distortions  are  also  apparent  in  applying  standard 
short-time  spectral  subtraction  to  speech.  In  this  case,  the  processing  typically  results  in  the  pres¬ 
ence  of  objectionable  short  tone  bursts  of  varying  frequency.  In  our  experiments  with 
magnitude-only  spectral  subtraction  applied  to  speech,  we  have  also  observed  the  same  artifacts. 
In  this  section  we  will  discuss  the  causes  of  these  artifacts.  In  the  next  section,  we  propose  various 
techniques  for  suppressing  the  artifacts.  In  particular,  the  proposed  techniques  can  be  incor¬ 
porated  into  both  the  standard  as  well  as  the  magnitude-only  versions  of  short-time  spectral  sub¬ 


traction. 


Fig.  7.3  Original  Image 


Fig.  7.4  Spectral  Subtraction  Artifacts 
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Experiments  with  short-time  spectral  subtraction  on  noisy  speech  as  well  as  image  data  indi¬ 
cate  that  the  objectionable  harmonic  pattern  artifacts  arise  primarily  because  of  a  few,  large 
amplitude  narrowband  peaks  of  noise  energy  remaining  after  spectral  subtraction  in  the  spectrum 
of  each  short-time  section.  Specifically,  the  spectral  magnitudes  of  short-time  sections  in  the  noise 
signal  e  (n )  deviate  randomly  about  the  assumed  power  spectrum,  Pe  (w).  Such  deviations  result 
in  residues  of  noise  energy  remaining  after  spectral  subtraction.  For  wideband  random  noise  with 
a  values  in  (6.1)  in  the  range  generally  used  [6,24],  the  residues  tend  to  be  dominated  by  a  few 
narrowband  peaks  of  relatively  large  amplitude.  Of  these  peaks,  the  most  undesirable  are  the 
ones  at  frequencies  where  there  is  little  or  no  signal  energy.  These  give  rise  to  harmonic  varia¬ 
tions  in  the  short-time  sections.  Since  the  noise  component  of  the  spectrum  has  independent  devi¬ 
ations  from  short-time  section  to  short-dme  section,  the  dominating  frequency  of  the  harmonic 
patterns  also  changes  randomly  from  short-time  section  to  short-time  section.  The  resulting 
artifact  is  clearly  apparent  in  the  image  of  Figure  7.4.  In  the  next  section,  we  propose  specific 
techniques  for  suppressing  this  artifact.  One  of  the  techniques,  referred  to  as  multi-window  spec¬ 
tral  smoothing  also  reduces  the  rippling  effect  near  large  discontinuities.  This  particular  artifact  is 
due  to  the  inherent  blurring  associated  with  signal  characteristics  which  change  rapidly  in  relation 
to  the  duration  of  the  analysis  window. 

7.5  Artifact  Suppression  Techniques 

In  the  previous  section,  we  observed  two  particular  artifacts  associated  with  short-time  spec¬ 
tral  subtraction  in  both  its  standard  and  magnitude-only  implementations.  The  most  prominent 
artifact  is  the  harmonic  pattern  which  is  clearly  visible  in  the  processed  image  of  Figure  7.4.  The 
other  artifact,  more  important  for  images  rather  than  speech,  is  a  rippling  effect  near  large 
discontinuities  such  as  those  at  object  boundaries.  In  this  section,  we  propose  three  techniques  for 
the  suppression  of  the  harmonic  pattern  artifact.  However,  one  of  the  techniques,  multi-window 
spectral  subtraction,  also  reduces  the  rippling  effect  at  sharp  discontinuities.  Throughout  the 
remainder  of  this  section  the  term  short-time  spectral  subtraction  without  any  other  qualification 
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will  refer  to  both  the  standard  as  well  as  the  magnitude  only  implementations. 

Multipass  Spectral  Subtraction 

The  implementation  of  this  technique  for  suppressing  artifacts  consists  of  repeated  applica¬ 
tion  of  the  entire  shoTt-time  spectral  subtraction  procedure.  Specifically,  on  each  of  the  total  of  K 
passes  a/K  is  used  in  (6.1)  and  a  new  estimate  of  s(n)  is  obtained.  Each  pass  uses  the  estimate  of 
s(n)  from  the  previous  pass  as  its  input. 

The  key  to  this  procedure  seems  to  lie  in  the  post- subtraction  mapping  to  the  time  domain 
at  the  end  of  each  pass.  Based  on  experiments  conducted  for  this  thesis,  it  is  conjectured  that  the 
mapping  to  the  time  domain  causes  a  spectral  magnitude  smoothing  between  overlapping  short- 
time  sections.  Thus,  as  noise  energy  is  being  subtracted,  a  smoothing  process  is  taking  place 
simultaneously  between  overlapping  sections.  Furthermore,  as  K  increases,  the  spectrum  of  each 
short-time  section  begins  to  affect  the  smoothing  of  distant  spectral  magnitudes.  The  idea  of 
smoothing  between  the  spectral  magnitudes  of  different  short-time  sections  is  more  directly 
explored  in  the  Neighborhood  Smoothing  technique  described  next. 

Neighborhood  Smoothing 

This  approach  is  based  on  the  assumption  that  the  spectral  magnitudes  of  neigboring  short- 
time  sections  in  the  degraded  signal  have  larger  deviations  with  respect  to  each  other  than  similar 
sections  in  the  undegraded  signal.  Thus,  if  the  spectral  magnitudes  of  neighboring  segments  of 
the  noisy  signal  are  averaged,  then,  in  principle,  the  effect  of  the  noise  is  reduced.  This,  in 
effect,  corresponds  to  time  smoothing  of  the  short-time  spectral  magnitude.  The  neighborhood 
smoothing  can  be  earned  out  using  either  linear  smoothing  or  median  smoothing  [26]  techniques. 
Multi-Window  Smoothing 

This  technique  capitalizes  on  the  flexibility  in  the  choice  of  the  analysis  window.  In  its  most 
general  form,  the  idea  is  to  obtain  signal  estimates  using  different  analysis  windows  for  the 
short-time  spectrum  (but  the  same  amount  of  spectral  subtraction).  This  is  followed  by  some 
kind  of  spectral  smoothing  between  the  different  estimates.  In  particular,  it  is  found  that  using 
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the  same  window  shape  but  shifted  locations  for  each  estimate  is  very  successful.  In  the  experi¬ 
ments  conducted  for  this  thesis,  short-time  spectral  magnitudes  of  the  various  estimates  were 
median  averaged  in  the  final  step,  using  a  rectangular  analysis  window. 

All  three  techniques  listed  above  significantly  reduce  the  harmonic  pattern  artifact  signifi¬ 
cantly.  Furthermore,  the  multi-window  technique  is  also  successful  in  reducing  the  rippling 
artifacts  at  sharp  discontinuity.  The  performance  of  these  techniques  is  illustrated  in  Figure  7.5. 
The  image  on  the  left  side  of  the  figure  is  the  same  image  as  that  shown  in  Figure  7.4.  This 
image  was  processed  with  standard  short-time  spectral  subtraction  without  any  modifications  for 
suppression  artifact.  On  the  other  hand,  the  other  image  in  Figure  7.5  represents  the  effect  of 
short-time  spectral  subtraction  implemented  with  the  multi-pass  and  muhi-winow  procedures. 
Specifically,  multi- window  spectral  smoothing  is  carried  out  in  each  pass  of  the  multipass  imple¬ 
mentation  of  short-time  spectral  subtraction.  It  is  apparent  from  Figure  7.5  that  the  application 
of  artifact  suppression  procedures  is  quite  successful  in  reducing  the  harmonic  pattern  as  well  as 
the  rippling  effects  near  high  contrast  edges.  We  have  also  applied  these  techniques  to  noise 
reduction  in  speech  processing.  We  find  that  the  objectionable  short  tone  bursts  of  varying  fre¬ 
quency  are  significantly  suppressed. 


Fig.  7.5  Standard  and  Improved  Spectral  Subtraction 
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CHAPTER  EIGHT:  CONCLUSIONS 

In  this  thesis,  we  have  shown  that  discrete-time  signal  processing  can  be  accomplished  using 
only  the  magnitude  of  the  short-time  spectrum.  In  panicular,  large  classes  of  signals  were  found 
to  be  uniquely  representable  with  the  short-time  spectral  magnitude  under  conditions  that  are 
often  satisfied  in  practical  applications.  Furthermore,  several  algorithms  were  derived  for  recon¬ 
structing  a  discrete-time  signal  from  samples  of  its  short-time  spectral  magnitude.  These  algo¬ 
rithms  include  some  that  are  designed  to  yield  reasonable  signal  estimates  from  a  processed 
short-time  spectral  magnitude  which  does  not  correspond  to  the  short-time  spectral  magnitude  of 
any  signal.  This  is  z  i  important  result  since  almost  any  kind  of  processing  violates  the  structure 
imposed  in  the  definition  of  the  short-time  spectral  magnitude. 

To  illustrate  the  practical  usefulness  of  the  results  in  this  thesis,  we  considered  the  problems 
of  noise  reduction  and  time-scale  modification  of  speech.  The  magnitude-only  short-time  spec¬ 
tral  processing  technique  we  have  developed  for  time-scale  modification  is  considerably  simpler 
and  computationally  more  efficient  than  previous  short-time  spectral  processing  techniques. 
Furthermore,  in  terms  of  speech  quality,  the  magnitude- only  technique  appears  comparable  to 
the  other  techniques.  In  the  case  of  noise  reduction,  standard  short-time  spectral  processing  tech¬ 
niques  generauy  affect  just  the  magnitude  of  the  short-time  spectrum.  Thus, the  short-time  spec¬ 
tral  phase  of  the  noisy  signal  is  retained  in  the  processed  signal.  It  is  therefore  of  interest  to 
develop  techniques  that  estimate  a  processed  short-time  spectral  phase.  One  approach  is  to  esti¬ 
mate  the  processed  phase  directly  from  the  processed  short-time  spectral  magnitude.  This  is  easily 
accomplished  with  the  techniques  developed  in  this  thesis  for  signal  estimation  from  processed 
short-time  spectral  magnitude.  Our  initial  experiments  on  such  noise  reduction  in  speech  signals 
have  given  results  that  appear  comparable  to  those  obtained  with  traditional  short-time  spectral 
processing  techniques.  This  result  is  potentially  useful  in  designing  systems  for  combined  noise 
and  bandwidth  reduction  of  speech.  Such  systems  perform  bandwidth  compression  on  a  noisy 
signal,  transmit  it  over  a  possibly  noisy  channel,  and  finally  estimate  the  original  undegraded 


signal.  The  results  in  this  thesis  may  be  used  to  achieve  bandwidth  compression  of  the  noisy  sig¬ 
nal  by  efficiently  coding  its  short-time  spectral  magnitude.  Time-scale  modification  may  also  be 
used  at  this  stage.  Once  the  transmitted  signal  has  been  received  at  the  other  end,  magnitude- 
only  short-time  spectral  processing  may  be  applied  for  noise  reduction.  This  is  an  important 
application  which  deserves  more  research  in  the  future. 

There  is  a  considerable  amount  of  theoretical  and  applied  research  that  needs  to  be  persued 
in  light  of  the  results  presented  in  this  thesis.  The  most  obvious  problem  is  to  use  these  results  in 
application  areas  other  than  those  considered  in  this  thesis. This  includes  other  applications  within 
speech  processing  such  as  vocoder  design  as  well  as  applications  in  other  areas  such  as  image, 
acoustical  and  geophysical  signal  processing.  In  the  theoretical  realm,  it  is  of  interest  to  further 
extend  the  conditions  under  which  a  signal  is  uniquely  specified  by  its  short-time  spectral  magni¬ 
tude.  For  example,  the  uniqueness  conditions  derived  in  this  thesis  for  signal  representation  with 
short-time  spectral  magnitude  generally  require  the  knowledge  of  a  few  initial  samples  of  the  sig¬ 
nal.  It  is  of  interest  to  determine  other  ways  of  guaranteeing  unique  signal  specification  that 
require  a  different  type  of  information  about  the  signal.  It  should  be  observed  that  the  need  for 
the  initial  samples  condition  was  established  through  a  counterexample  that  was  based  on  a  spe¬ 
cial  class  of  signals  and  a  rectangular  analysis  window.  The  question  is  whether  excluding  the  rec¬ 
tangular  analysis  window  and  that  special  class  of  signals  can  in  fact  be  sufficient  to  remove  the 
requirement  of  a-priori  knowledge  on  the  initial  samples. 

All  the  algorithms  for  signal  estimation  from  short-time  spectral  magnitude  that  were 
implemented  in  this  thesis  estimated  short-dme  sections  of  the  signal  in  a  sequential  order.  How¬ 
ever,  it  was  indicated  that  improved  performance  may  be  obtained  by  the  simultaneous  extrapo¬ 
lation  of  several  short-time  sections  at  a  time.  For  example,  such  algorithms  may  be  less  sensitive 
to  errors  in  the  knowledge  of  the  initial  signal  samples.  The  implementation  and  study  of  simul¬ 
taneous  extrapolation  algorithms  should  therefore  be  an  important  part  of  further  research. 
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